Today I came across a problem where Column in a SQL server database contained Data with HTML tags, while generating a idea how can i create search from the whole site's content that is inserted into database using editor. I saw if i search the content as it is which contains the HTML tags as well, it would not be a good way to do this thing. Then i decided to remove the HTML tags from the data. And wrote below code to do that for me:
I got the idea from the Best way to strip html tags from a string in sql-server, and then i had modified as i wanted it.ALTER FUNCTION [dbo].[udf_StripHTML] ( @HTMLText varchar(MAX) ) RETURNS varchar(MAX) AS BEGIN DECLARE @Start int DECLARE @End int DECLARE @Length int -- Replace the HTML entity & with the '&' character (this needs to be done first, as -- '&' might be double encoded as '&amp;') SET @Start = CHARINDEX('&', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '&') SET @Start = CHARINDEX('&', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 END -- Replace the HTML entity < with the '<' character SET @Start = CHARINDEX('<', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '<') SET @Start = CHARINDEX('<', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 END -- Replace the HTML entity > with the '>' character SET @Start = CHARINDEX('>', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '>') SET @Start = CHARINDEX('>', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 END -- Replace the HTML entity & with the '&' character SET @Start = CHARINDEX('&amp;', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '&') SET @Start = CHARINDEX('&amp;', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 END -- Replace the HTML entity with the ' ' character SET @Start = CHARINDEX(' ', @HTMLText) SET @End = @Start + 5 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, ' ') SET @Start = CHARINDEX(' ', @HTMLText) SET @End = @Start + 5 SET @Length = (@End - @Start) + 1 END -- Replace any <br> tags with a newline SET @Start = CHARINDEX('<br>', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, CHAR(13) + CHAR(10)) SET @Start = CHARINDEX('<br>', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 END -- Replace any <br/> tags with a newline SET @Start = CHARINDEX('<br/>', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, 'CHAR(13) + CHAR(10)') SET @Start = CHARINDEX('<br/>', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 END -- Replace any <br /> tags with a newline SET @Start = CHARINDEX('<br />', @HTMLText) SET @End = @Start + 5 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, 'CHAR(13) + CHAR(10)') SET @Start = CHARINDEX('<br />', @HTMLText) SET @End = @Start + 5 SET @Length = (@End - @Start) + 1 END -- Remove anything between <whatever> tags SET @Start = CHARINDEX('<', @HTMLText) SET @End = CHARINDEX('>', @HTMLText, CHARINDEX('<', @HTMLText)) SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '') SET @Start = CHARINDEX('<', @HTMLText) SET @End = CHARINDEX('>', @HTMLText, CHARINDEX('<', @HTMLText)) SET @Length = (@End - @Start) + 1 END -- Replace the HTML entity --> with the ' ' character SET @Start = CHARINDEX('-->', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '') SET @Start = CHARINDEX('-->', @HTMLText) SET @End = @Start + 4 SET @Length = (@End - @Start) + 1 END -- Replace the HTML entity --> with the ' ' character SET @Start = CHARINDEX('->', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '') SET @Start = CHARINDEX('->', @HTMLText) SET @End = @Start + 3 SET @Length = (@End - @Start) + 1 END RETURN LTRIM(RTRIM(@HTMLText)) END
That's All. We are ready with the recipe to remove HTML tags from a string using USER DEFINED FUNCTION IN SQL SERVER.
One thing more the above UDF will work with SQL SERVER 2005 and above as below versions does not have varchar(Max) datatype. If you want to use this function with the 2000 or below versions just replace the varchar(max) with varchar(4000).
Hope this will help you somehow.
No comments:
Post a Comment