LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
Email to...
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[ILUG] Deleting duplicate photos

[ILUG] Deleting duplicate photos

Francis Daly francisdaly at gmail.com
Mon Sep 29 13:45:58 IST 2008


2008/9/29 Timothy Murphy <gayleard at eircom.net>:
> What is the best way of eliminating duplicate photos
> on a number of machines, all running Linux (Fedora or CentOS)?
>
> I suppose one could ask the same question about files generally;
> how to tag or delete duplicates.

Brute force?

On each machine:

  find . -type f -exec md5sum \{} \; | sed "s/$/ "$(hostname)/ > filelist.$$

will create an md5 checksum of each file examined, two spaces, the
filename, space, the hostname. For filenames including newline
characters, you're on your own.

"$$" is "hopefully unique enough for this small sample". Use something
distinct on each machine for safety.

Gather those files together and print duplicates:

  sort filelist.* | uniq -w 32 -D

which will print each line where any two lines have the same first 32
characters -- pick a different number if you prefer sha1sum or cksum
or sum instead of md5sum.

>From that list, pick which of the matching filename-hostname pairs you
want to get rid of. Check that they really are the same, and rm.

This refers to byte-identical files (within the limit of the
checksum). "duplicate photos" may not match that, if someone has
messed with metadata or anything else internal.

If you're worried about that, you could strip exif data before
checksumming and have a slightly better chance of catching more
repeats.

Good luck,

f



More information about the ILUG mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell