LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[ILUG] Deleting duplicate photos

[ILUG] Deleting duplicate photos

Keith Gaughan kmgaughan at eircom.net
Tue Sep 30 06:00:22 IST 2008


Timothy Murphy wrote:
> What is the best way of eliminating duplicate photos
> on a number of machines, all running Linux (Fedora or CentOS)?
> 
> I suppose one could ask the same question about files generally;
> how to tag or delete duplicates.
> 
> Any suggestions gratefully received.

Here's what I use myself:

     http://talideon.com/weblog/2008/02/find-duplicates.cfm

It was written partly because I needed (and I really do mean *needed*)
something like this, and partly because I wanted a decent demonstration
of how to use generators in Python for the next time I was asked.

It does the sorting in three phases, each one slower than the last: first by
size (which catches an awful lot generally), then passing the contents of
each file through either zlib.crc32 (Adler-32) or hashlib.md5 (zlib.crc32 is
much faster than hashlib.md5 and though the results aren't quite as good,
gives a significant net speed-up generally), and then compares the remaining
groups of files directly with one another.

I've been meaning to extend it so that it treats certain kinds of file
differently, such as ignoring ID3 and EXIF data, but I've never had the time
or need.

K.



More information about the ILUG mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell