Finding Duplicate Files in a Directory Tree
Gregg
Sometimes I need to find all of the duplicate files in a directory tree. I have this issue all of the time when I move my music collection. Here is a nifty script to sort these things out:
#!/bin/bashfind -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 -D --all-repeated=separate
If you pipe the output of the above into a text file, for example, duplicates.txt, you can then create a script from that:
awk '{$1="";printf("rm \"%s\"\n",$0);}' ~/duplicates.txt >~/duplicates.sh
Then edit the file and remove the lines for the files you want to keep, make the script executable and run it. Done.