Padraig Brady wrote:
> Padraig Brady wrote:
>>> Rory Winston wrote:
>>>>> Hi,
>>>>>> I'm trying to use sed to do the following: search through a .jsp file
>>> for
>>> any <img> references, and then generate a bare list of the image
>>> filenames.
>>> So a .jsp page with 3 images inline would generate an output of:
>>>>>> a.gif
>>> b.gif
>>> c.gif
>>>>>> I'm trying to do it like the following (for this example, I'm
>>> ignoring any
>>> complications due to case and/or whitespace):
>>>>>> sed -n "/img src=\"/,/\">/p" foo.jsp
>>>>>> But this doesnt just print out image filenames - it prints out entire
>>> lines.
>>> Has anyone done anything like this already? If anyone has any grep-based
>>> solutions that would be great too. Correct me if I'm wrong, but is
>>> sed (and
>>> Perl) able to handle certain types of multi-line matching that grep
>>> cannot?
>>>>>> Cheers!
>>> Rory
>>>>>>> How about:
>> sed -n 's/<.*img.*src="\([^"]*\)".*/\1/gp' foo.jsp
>> Padraig.
>>>>>> The script above doesn't deal correctly with multiple images
> on the same line, the following is better:
>> cat foo.jsp |
> sed -e 's/<[^>]*img/¬<img/g' |
> tr "¬" "\n" |
> sed -n 's/<.*img.*src="\([^"]*\)".*/\1/gp'
>Stephen's suggestion of not printing duplicate images is
obviously correct, so for completeness, and removing
the useless use of cat:
sed -e 's/<[^>]*img/¬<img/g' foo.jsp | #put each <img>...
tr "¬" "\n" | #on a new line.
sed -n 's/<.*img.*src="\([^"]*\)".*/\1/gp' |
sort -u
Padraig.
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!