Finding Duplicate Files in a Directory Tree
Sometimes I need to find all of the duplicate files in a directory tree. I have this issue all of the time when I move my music collection. Here is a nifty script to sort these things out:
#!/bin/bashfind -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 -D --all-repeated=separate
Printing Numbers using Thousand Separators
You can use a pipe to awk to output numbers with thousands separators (commas). For Example, here’s how you can total the 5th column of the ls -l command and print it with thousands separators:
$ ls -l | awk '{total = total + $5}END{print total}' | LC_ALL=en_US.UTF-8 awk '{printf("%'"'"'d\n", $0) }'21,387
This can be adapted to other commands as necessary.
Archive Only Files In a Directory
If you want to create a tar archive of only the files of a directory and exclude any subdirectories you can use the ls -la command and pipe the output to awk. However you need to remove the first 8 fields from the output and leave all of the remaining parts of the line in case there are spaces in the filename. One quick and dirty way of doing that is to set each of the 8 fields to a blank and then use sed to trim the leading spaces. You can optionally add quotation marks around the filename in your output too.
Echo File Until a Blank Line Is Reached
You can use the awk program to search and print lines in a file. If you wanted to print a file until the first blank line is reached you can use the following command to do that:
awk '$0 ~ /^$/ {exit;} {print $0;}' somefile.txt
How To Find All of the Shell Scripts In a Directory
This is a quick and dirty way which will list all of the files that are shell scripts:
for i in * do type=$(file ${i}|awk -F, ‘{print $2}’) if [[ “${type}” = " ASCII text executable" ]]; then echo “${i} is a shell script” fi done