November 12, 2012

Find duplicate files on OSX - one liner

As I was consolidating 15 years of photos onto a single disk, I ended up with a disk bloated with duplicates. Bloated might actually be a new level of understatement. And as I was looking into resolving this troublesome issue, I found that there was a wealth of commercial products simply detecting duplicates based on MD5 sums, which are trivial and cheap to compute. Not wanting to shell $30 or more to solve a simple problem, I googled around, as anyone would, to find a script that would do this on OSX but found mostly similar things for Linux.

So, time to dust out the awk/find/sed tools and see if I could crack that nut. Turns out it's trivial to do on OSX as well but the OSX tools dont have the same options as on Linux...

find . -type f -print0 | xargs -0 md5 | awk '{sub(/^MD5 \(/,"",$0);sub(/\) =/,"",$0);md5=$NF;$NF=""; print md5" "$0}'|tee /tmp/files.out|awk '{ print $1}'|sort|uniq -d >/tmp/dupes.out ; for i in `cat /tmp/dupes.out` ; do echo $i; awk '$1 ~ /^'$i'$/ {$1=" ";print} ' /tmp/files.out; echo;echo; done > /tmp/duplicate-files.txt; rm -f /tmp/files.out /tmp/dupes.out ; 

you can then simply cat the result file:
cat /tmp/duplicate-files.txt

pretty simple stuff, I can now clean my pix from useless duplicates!

3 comments:

  1. I got the solution when i used a utility Duplicate Files Deleter - tool for finding and deleting duplicate files,

    ReplyDelete
  2. i would suggest you to try DuplicateFilesDeleter , it can help resolve duplicate files issue.

    ReplyDelete
  3. Below are the 15 best Minecraft mods everyone should try. Optifine. No matter if it's your first time playing Minecraft or you've been playing it for a long time now. 9minecraft mods

    ReplyDelete