Working with Bash - awk

{% raw %}
Now a quickie - one of my favorite tools: awk.  Some of you will probably think of this as obvious, and that's great.  But tools like awk are the things that I skipped learning when I got started... and when I finally started using them, my world expanded tremendously.  Here's hoping I can pay it forward to one other sysadmin out there.  So let's get going.

Awk


My first handy command is awk . This command lets you set a pattern to hunt for, and then an action to take every time it finds the pattern.  Pattern, action.  And that's the format on the command line:
awk pattern {action}
If you omit the pattern, it takes the action on every line.  It also automatically considers the input as a series of fields, and assigns them variables for you: $1 is the first field, $2 is the second, and so on.  $0 is the whole line.

One really common usage (and the one I was looking for today) is to just print a single column.  In my case, I wanted to get a list of file sizes in a directory. So I used
ls -l | awk '{print $5}'
Actions can be pretty impressive - you can use if/then, do/while, and other simple constructs.  You can even stack awk pattern/action constructs by separating them with a semicolon.

So now I'm on a quest to do as much as possible with this one command. I'm going to change my initial requirements: what if the directory I'm listing has subdirectories?  I want to make sure to exclude those from my filesize list.  So now I get to play with the "pattern" part of awk.
ls -l | awk '/^-/ {print $5}'
Note the forward slashes which denote the text to search for. In this case it's ^- , meaning a newline followed immediately by a -.

Now actually, the whole point of this exercise was to get the sum total size of the files in this directory.  So let's make awk do that for us.  The first thing to learn is that awk can do math.  If you have a series of numbers on one line, you can say
awk '{ sum = $1 + $2 + $3} {print sum}'
And end up with a sum of the 3 numbers for each line in the input.  If you want to do math across lines, you have to use stored variables.  It's simple to say '{ sum = $5 }'.  But if you want each line to add it's $5 to 'sum', you simply use '{ sum += $5 }' .  We could just as easily have any other operator in there, it works just as well with -= , *= , /= etc.   Of course, in this case what we really care about is the sum of all the filesizes, so let's update our command:
ls -l | awk '/^-/ { sum += $5 } {print sum}'
Now for each line of the input, it will spit out whatever the current file size tally is, adding $5 to it when the regular expression ^- matches. Actually, that's pretty unreadable, so let's add another simple math operator in there to show this in megabytes:
ls -l | awk '/^-/ { sum += $5 / 1048576 } {print sum, "M" }'
Don't feel bad if you don't have the number of bytes in a megabyte memorized - I have to run to my calculator every time for 1024 * 1024.  I should have just included it in the expression, come to think of it...

This is pretty sweet so far, and more than enough to get me the information I wanted.  But let's polish it a little more - that stupid list bothers me, since all I really care about is the last value.  Now we're going to take advantage of BEGIN and END, which tell awk to do something before the first line of the input, or after the last line, respectively.

ls -l | awk '/^-/ { sum += $5 / (1024 * 1024) } END { print "This directory contains ", sum, "M of files." }'
Pretty sweet!  Right there, I just saved myself a lot of swearing.
{% endraw %}
comments powered by Disqus