November 12, 2012

Find duplicate files on OSX - one liner

As I was consolidating 15 years of photos onto a single disk, I ended up with a disk bloated with duplicates. Bloated might actually be a new level of understatement. And as I was looking into resolving this troublesome issue, I found that there was a wealth of commercial products simply detecting duplicates based on MD5 sums, which are trivial and cheap to compute. Not wanting to shell $30 or more to solve a simple problem, I googled around, as anyone would, to find a script that would do this on OSX but found mostly similar things for Linux.

So, time to dust out the awk/find/sed tools and see if I could crack that nut. Turns out it's trivial to do on OSX as well but the OSX tools dont have the same options as on Linux...

find . -type f -print0 | xargs -0 md5 | awk '{sub(/^MD5 \(/,"",$0);sub(/\) =/,"",$0);md5=$NF;$NF=""; print md5" "$0}'|tee /tmp/files.out|awk '{ print $1}'|sort|uniq -d >/tmp/dupes.out ; for i in `cat /tmp/dupes.out` ; do echo $i; awk '$1 ~ /^'$i'$/ {$1=" ";print} ' /tmp/files.out; echo;echo; done > /tmp/duplicate-files.txt; rm -f /tmp/files.out /tmp/dupes.out ; 

you can then simply cat the result file:
cat /tmp/duplicate-files.txt

pretty simple stuff, I can now clean my pix from useless duplicates!

2 comments:

  1. I got the solution when i used a utility Duplicate Files Deleter - tool for finding and deleting duplicate files,

    ReplyDelete
  2. i would suggest you to try DuplicateFilesDeleter , it can help resolve duplicate files issue.

    ReplyDelete