At 16:39 30/06/00, Niall O Broin wrote:
>Well, obviously the answer is Perl - now what was the question ? Uniq is out
>of the question because it only works on sorted input, and the business of
>prepending a number and then removing it offends me :-) so I offer
>>perl -ne 'print unless ($seen{$_}++)'
>>as a pipe to do the job. There's one slight hitch - this will consume memory
>like there's no tomorrow. If the file(s) you want to treat are somewhat
>smaller than your free virtual memory, you'll be OK.
In a similar vein
perl -MMD5 -ne 'print unless $seen{MD5->hash($_)}++'
should consume lots less memory if the lines are long, of course if you're
really unfortunate 2 of your lines may hash to the same string under MD5
but this is highly unlikely, especially if the lines re in some kind of
regular format. Personally I don't think I'd use this, unless I was just
trying to get statistics on how many duplicates there are, but I thought it
was fun,
Fergal
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!