Time For More Awk
Last time around, we gawked at awk. I'm following up here with a couple recipes I've used recently, at least as a handy reference for myself.
Ratcheting Up The Matching
I mentioned that match expressions can be arbitrarily complex.One thing I've found that really handy for is selecting things like long-running requests since you can match on the request duration. So, with a request.log
like:
01:52:30 /blog 200 2
02:24:41 /projects 200 5
03:05:51 /blog 500 4000
04:21:16 /projects 200 1
05:49:27 /blog 200 2
You can print out the slow request lines with:
awk '$NF > 1000 { print }' request.log
# 2021-12-04T03:05:51 /blog 500 4000
Pairing Patterns
Another neat awk feature we didn't touch on last time is that you can match a range of records by combining two patterns with a,
. awk will match the records between the two patterns, effectively "turning on" matching with the first and "turning off" with the second.
Think of it like a delicious textual sandwich, with your tasty content sitting between the two patterns. Okay, maybe this analogy isn't exactly the greatest thing since sliced bread.
Anyway, this is handy, for example, to filter by time slices. With the request.log
above, we can print out the requests between 2am and 4am by matching on 2
to start and 4
to stop:
awk '$1 ~ "T02", $1 ~ "T04" { print }' request.log
# 2021-12-04T02:24:41 /projects 200 5
# 2021-12-04T03:05:51 /blog 500 4000
# 2021-12-04T04:21:16 /projects 200 1
Note that the closing pattern is inclusive, so we scoop up that first 4am record.
Joining The Separator
By default, awk will split up the fields in a record based on whitespace, but you can customize what it looks for.Let's say our request log fields were separated by -
:
2021-12-04T01:52:30 - /blog - 200 - 2
2021-12-04T02:24:41 - /projects - 200 - 5
2021-12-04T03:05:51 - /blog - 500 - 4
2021-12-04T04:21:16 - /projects - 200 - 1
2021-12-04T04:22:16 - /projects - 200 - 1
2021-12-04T05:49:27 - /blog - 200 - 2
You can pass the -F
flag with the separator string to change how awk parses the records, so we can print the responses like so:
awk -F ' - ' '{ print $(NF-1) }' request.log
# 200
# 200
# 500
# 200
# 200
# 200
Without the -F
, $(NF1)
would refer to the last -
, which isn't super useful.
Even cooler, the separator string is actually a regex! So, if you wanted to split up the timestamp too, you could do:
awk -F 'T|( - )' '{ print $2 " " $(NF-1) }' request.log
# 01:52:30 200
# 02:24:41 200
# 03:05:51 500
# 04:21:16 200
# 04:22:16 200
# 05:49:27 200
(Ignoring how badly this breaks if there's a T
somewhere else in the line)
Tools For Fools
As a point of professional pride, I do want to stress that, in my day job, we do actually ingest our logs into a proper tool for querying and visualization, so all this awk stuff is somewhat primitive. However, for various dysfunctional reasons, it's been pretty handy to have alongside the fancier playthings.I guess it's like a toolbox, y'know. Sure, you want all your fancy gadgets, but sometimes all you need is a hammer.