LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
Email to...
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[Webdev] Negative lookahead assertions in Perl regex

[Webdev] Negative lookahead assertions in Perl regex

Dermot McGahon dmcgahon at iol.ie
Wed Aug 23 20:33:47 IST 2000


Hi,
 
I have a regex that strips out HTML comments from HTML documents. It
looks like this:
 
  $document =~ s{<!--.*?-->}{}gm;
 
However, when a browser doesn't know how to process the <SCRIPT
LANGUAGE="Javascript"> tag it often tries to display the javascript as
text so a common ploy to avoid this is to wrap the javascript with
HTML comments so that the browser displays nothing rather than the
javascript. Maybe an example will help:
 
<script language="JavaScript">
 
<!--
 
imBanner1  = new Image ();
imBanner1.src = "images/banner_1.gif";
sBanner1Link  = "http://www.forbes.com/asap/00/0403/84b.htm"; 
 
 
//-->
 
</script>
 
Non supporting browsers will now display nothing (well probably the
HTML below) rather than mistakingly displaying the javascript.
 
Anyway, I now need to modify the commment stripping regex so that it
only strips comments when they are not contained with <SCRIPT> and
</SCRIPT>. I thought that negative lookahead assertion might be the
way to go so I tried:
 
  $document =~ s{(<!--.*?-->)(?!\s*</script>)}{}gim;
 
and
 
  $document =~ s{<!--.*?-->\s*(?!</script>)}{}gim;
 
and a few other combinations but I can't get it to work as I'd
like. The first regex is matching past the </script> tag, it seems to
be matching until the next "-->" that doesn't have a </script> tag
after it. Of course, that's what I asked it to do :) but the behaviour
that I want is less greedy and I'm not sure how to get that behaviour.
 
The second regex matches exactly the same as the original one. It
strips out anything between "<!--" and "-->" and doesn't seem to pay a
blind bit of notice to the </script> tag.
 
Can anyone edify me as to what I'm doing wrong ?
 
Dermot.
--







More information about the Webdev mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell