On Mon, Feb 12, 2001 at 09:51:09AM -0000, JustinMacCarthy wrote:
> If I want to parse a html file (might be badly formed ) and take out a list
> of the tags used (unique list) and the attributes of each tag used
> (aggregate)..
> What tool would you recommend? Sed or gawk might do it, but maybe there is
> something better out there..
hate to say it but perl would be best. alternatively use lex.
perl -ne
'chomp;$a.=$_; END{@a=split(/(<[^>]*>)/m,$a);map {print if (/^</)} @a;}'
the above should dump out all the tags. more then that i your job. :)
kevin
--
kevin at suberic.net i... i have a dream. and that dream is:
fork()'ed on 37058400 use DIY::Tiler;
meatspace place: work my($t) = new DIY::Tiler;
$t->tile(-room => "en-suite", -style => "stone");
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!