Timothy Murphy wrote:
> What is the best way of eliminating duplicate photos
> on a number of machines, all running Linux (Fedora or CentOS)?
>> I suppose one could ask the same question about files generally;
> how to tag or delete duplicates.
>> Any suggestions gratefully received.
Here's what I use myself:
It was written partly because I needed (and I really do mean *needed*)
something like this, and partly because I wanted a decent demonstration
of how to use generators in Python for the next time I was asked.
It does the sorting in three phases, each one slower than the last: first by
size (which catches an awful lot generally), then passing the contents of
each file through either zlib.crc32 (Adler-32) or hashlib.md5 (zlib.crc32 is
much faster than hashlib.md5 and though the results aren't quite as good,
gives a significant net speed-up generally), and then compares the remaining
groups of files directly with one another.
I've been meaning to extend it so that it treats certain kinds of file
differently, such as ignoring ID3 and EXIF data, but I've never had the time
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!