Hi,
I have a regex that strips out HTML comments from HTML documents. It
looks like this:
$document =~ s{<!--.*?-->}{}gm;
However, when a browser doesn't know how to process the <SCRIPT
LANGUAGE="Javascript"> tag it often tries to display the javascript as
text so a common ploy to avoid this is to wrap the javascript with
HTML comments so that the browser displays nothing rather than the
javascript. Maybe an example will help:
<script language="JavaScript">
<!--
imBanner1 = new Image ();
imBanner1.src = "images/banner_1.gif";
sBanner1Link = "http://www.forbes.com/asap/00/0403/84b.htm";
//-->
</script>
Non supporting browsers will now display nothing (well probably the
HTML below) rather than mistakingly displaying the javascript.
Anyway, I now need to modify the commment stripping regex so that it
only strips comments when they are not contained with <SCRIPT> and
</SCRIPT>. I thought that negative lookahead assertion might be the
way to go so I tried:
$document =~ s{(<!--.*?-->)(?!\s*</script>)}{}gim;
and
$document =~ s{<!--.*?-->\s*(?!</script>)}{}gim;
and a few other combinations but I can't get it to work as I'd
like. The first regex is matching past the </script> tag, it seems to
be matching until the next "-->" that doesn't have a </script> tag
after it. Of course, that's what I asked it to do :) but the behaviour
that I want is less greedy and I'm not sure how to get that behaviour.
The second regex matches exactly the same as the original one. It
strips out anything between "<!--" and "-->" and doesn't seem to pay a
blind bit of notice to the </script> tag.
Can anyone edify me as to what I'm doing wrong ?
Dermot.
--
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!