LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
Email to...
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[ILUG] Sed question

[ILUG] Sed question

Feargal Reilly feargal at helgrim.com
Fri Feb 22 14:32:24 GMT 2002


At 12:11 22/02/02, Padraig Brady wrote:
>Padraig Brady wrote:
>>Padraig Brady wrote:
>>
>>>Rory Winston wrote:
>>>
>>>>Hi,
>>>>
>>>>I'm trying to use sed to do the following: search through a .jsp file for
>>>>any <img> references, and then generate a bare list of the image filenames.
>>>>So a .jsp page with 3 images inline would generate an output of:
>>>>
>>>>a.gif
>>>>b.gif
>>>>c.gif
>>>>
>>>>I'm trying to do it like the following (for this example, I'm ignoring any
>>>>complications due to case and/or whitespace):
>>>>
>>>>sed -n "/img src=\"/,/\">/p" foo.jsp
>>>>
>>>>But this doesnt just print out image filenames - it prints out entire 
>>>>lines.
>>>>Has anyone done anything like this already? If anyone has any grep-based
>>>>solutions that would be great too. Correct me if I'm wrong, but is sed (and
>>>>Perl) able to handle certain types of multi-line matching that grep cannot?
>>>>
>>>>Cheers!
>>>>Rory
>>>
>>>How about:
>>>sed -n 's/<.*img.*src="\([^"]*\)".*/\1/gp' foo.jsp
>>>Padraig.
>>>
>>The script above doesn't deal correctly with multiple images
>>on the same line, the following is better:
>>cat foo.jsp |
>>sed -e 's/<[^>]*img/¬<img/g' |
>>tr "¬" "\n" |
>>sed -n 's/<.*img.*src="\([^"]*\)".*/\1/gp'
>Stephen's suggestion of not printing duplicate images is
>obviously correct, so for completeness, and removing
>the useless use of cat:
>
>sed -e 's/<[^>]*img/¬<img/g' foo.jsp | #put each <img>...
>tr "¬" "\n" |                          #on a new line.
>sed -n 's/<.*img.*src="\([^"]*\)".*/\1/gp' |
>sort -u

Only problem with this, is it'll miss out tags spanning lines, and IMG tags.
I happened to be doing a similar thing last week, here's the tcl script I 
did up for it:

#!/path/to/tclsh
set f [open [lindex $argv 0] r]
set file [read $f]
close $f
set list [split $file <]
foreach i $list {
         if {[string length $i]} {
                 regsub -all "\n" [lindex [split $i >] 0] " " tag
                 if {![string compare -nocase [lindex $tag 0] [lindex $argv 
1]]} {
                         puts $tag
                 }
         }
}

Saved as foo, usage is
./foo filename tag
It'll spit out any html tags beginning with 'tag' in 'filename', wrapping 
them onto one per line.
so ./foo foo.jps img|sed -n 's/.*src="\([^"]*\)".*/\1/gp'

Will do the trick in all cases.


>Padraig.
>
>
>--
>Irish Linux Users' Group: ilug at linux.ie
>http://www.linux.ie/mailman/listinfo/ilug for (un)subscription information.
>List maintainer: listmaster at linux.ie

Feargal Reilly.
http://www.helgrim.com/





More information about the ILUG mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell