Posts

Finding Duplicate Files in a Directory Tree

Sometimes I need to find all of the duplicate files in a directory tree. I have this issue all of the time when I move my music collection. Here is a nifty script to sort these things out:

#!/bin/bash find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 -D --all-repeated=separate

Posts

Printing Numbers using Thousand Separators

You can use a pipe to awk to output numbers with thousands separators (commas). For Example, here’s how you can total the 5th column of the ls -l command and print it with thousands separators:

$ ls -l | awk '{total = total + $5}END{print total}' | LC_ALL=en_US.UTF-8 awk '{printf("%'"'"'d\n", $0) }' 21,387

This can be adapted to other commands as necessary.

Posts

Archive Only Files In a Directory

If you want to create a tar archive of only the files of a directory and exclude any subdirectories you can use the ls -la command and pipe the output to awk. However you need to remove the first 8 fields from the output and leave all of the remaining parts of the line in case there are spaces in the filename. One quick and dirty way of doing that is to set each of the 8 fields to a blank and then use sed to trim the leading spaces. You can optionally add quotation marks around the filename in your output too.

Posts

Echo File Until a Blank Line Is Reached

You can use the awk program to search and print lines in a file. If you wanted to print a file until the first blank line is reached you can use the following command to do that:

awk '$0 ~ /^$/ {exit;} {print $0;}' somefile.txt

Posts

How To Find All of the Shell Scripts In a Directory

This is a quick and dirty way which will list all of the files that are shell scripts:

for i in * do type=$(file ${i}|awk -F, ‘{print $2}’) if [[ “${type}” = " ASCII text executable" ]]; then echo “${i} is a shell script” fi done