Monday, July 27, 2009

Finding all large files on a linux host

I ran across the common problem of one of our file systems running short on space and needed to figure out what was all of a sudden occupying most of the space. Google pointed me to a DZone Snippets post on the subject.

The command worked as written, but if the results list was long, it was fairly difficult to identify the worst offenders. Reordering the awk output fixed that and a quick sort puts the biggest files at the bottom of the results. The only other changes were to use the "M" option to specify megabytes vs. kilobytes and to remove the human readable switch from the ls command. This allows the results to be accurately sorted. Otherwise, files with sizes in the gigabytes would appear at the top of the results.

This is what I ended up using:

find / -type f -size +50M -exec ls -l {} \; | awk '{ print $5 " -- " $9 }' | sort -n

Update (8-11-09):

So the original command above works, but does not take into account directory and filenames with spaces in them. The following awk command resolves that problem, but does have one known issue. If the filename is the same as another value on the line (e.g. user/group is root and filename is root), it will spit out everything from the user/group field on. I have not seen this behavior using the full command.

find / -type f -size +50M -exec ls -l {} \; | awk '{ print $5 " -- " substr($0,index($0,$9))}' | sort -n

No comments:

Post a Comment