| Date: Mon, 2 Feb 2004 12:55:08 +0000
| From: "John P. Looney" <valen at tuatha.org>
|
| On Mon, Feb 02, 2004 at 12:52:19PM +0000, olearypj at rte.ie mentioned:
| > I need to compare two VERY large files of names & extract the
| > lines that are not common to both. The problem is that the
| > same names appear in different positions in both files. [ ... ]
|
| Have a look at the "uniq" command. [ ... ]
another command would comm(1).
be aware, however, that both uniq(1) and `comm'
require the input to be sorted.
( I must confess I have never understood why the
input must be sorted. `uniq' could still deal
with _adjacent_ duplicate lines (e.g., N and N+1),
and `comm' could compare line N with line N.
why the insistence on sorting? )
incidentally, previous posters have suggested:
cat FILE1 FILE2 | sort | uniq --unique
winning the “Useless Use of cat(1)‟ Award™, and:
sort < FILE1 < FILE2 | uniq --unique
winning the "I Didn't Test This‟ Award™.
what's wrong with?:
sort FILE1 FILE2 | uniq --unique
or the `comm' (almost-)equivalent:
sort FILE1 FILE2 | comm -3
since sort(1) іs a _merge_-and-sort utility?
cheers!
-blf-
--
«How many surrealists does it take to | Brian Foster Montpellier,
change a lightbulb? Three. One calms | blf at utvinternet.ie France
the warthog, and two fill the bathtub | Stop E$$o (ExxonMobile)!
with brightly-colored machine tools.» | http://www.stopesso.com
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!