3.3.4. awk¶
Linux for Programmers and Users, Section 4.8.
Tutorial: http://www.gnu.org/software/gawk/manual/gawk.html (many others also available)
Note
The awk and gawk are the same on Linux systems. I refer to it as awk partly out of habit and because that is the original, more generic name that is used on all Unix systems. gawk is the GNU Project’s implementation of awk.
Awk is a programming language specializing in string processing.
Awk programs can be in a stand alone file or a one or two lines inside a shell script program. Like most Unix utilities, it can open a file to find the data, but if none is listed, it reads from standard input.
3.3.4.1. Vertical Filtering With Awk¶
The most common one line use is as a more robust, flexible alternative to cut.
owner=`ls -ld $file | awk '{print $3}'
Note the use of $0, $1, $2… to be the whole line, first column, second column, etc.
3.3.4.2. Horizontal Filtering With Awk¶
grep like behavior is also possible. Here we had awk open a file instead of reading from standard input.
awk '/foo/ { print $0 }' file
Awk scripts can have a BEGIN and END section along with instructions for what to do when each line is processed. The BEGIN section defines needed variables or parameters. In the next example, the BEGIN sections change built-in variables. END sections can be used to report data after all the lines were processed. For example, the body of the script could look for a maximum value, which is reported in the END section.
1009 timlinux:~/bin> cat unix2dos #!/usr/bin/awk -f BEGIN { ORS = "\r\n" } # Change output record separator to { print } # a carriage-return and line-feed. 1010 timlinux:~/bin> cat msdos2unix #!/usr/bin/awk -f BEGIN { RS = "\r\n" } # Change input record separator to { print } # (CR-LF). Output separator is default ("\n").
Here are two examples with an END section. The first example prints the longest line from the input data. Notice that awk has some built in functions like length, which are related to string processing. The second example prints the total number of bytes in all the files in the current directory that were last modified in November (of any year). Awk can do simple math.
awk '{ if (length($0) > max) max = length($0) } END { print max }' ls -l | awk '$6 == "Nov" { sum += $5 } END { print sum }'
Here is a longer example. It is started from the following simple shell script:
#!/bin/sh
/usr/bin/nawk -f $HOME/bin/logfilt pattern=$1
Here is the logfilt file referenced above. A word of explanation: This is a
program for filtering some log files. The log files had multi-lined record
that were desired, but they were were mixed in with reports of all sorts of
other things that were not desired. Since the records covered multiple lines,
grep would not do the job. $0 ~ /string/
means that the line contains the
pattern searched for. The machine that generated the reports was named rabbit.
So each report began with a line containing the word “RABBIT”, but the report
could not be identified until the second line.
BEGIN { rept_start = "RABBIT" rept_id = "REPT" either = "(RABBIT|REPT)" looked_for = 0 } $0 ~ rept_start { lastline = $0 looked_for = 0 next } $0 ~ rept_id { if ( $0 ~ pattern ) { print lastline print $0 looked_for = 1 next } else { looked_for = 0 next } } $0 !~ either { if ( looked_for ) print $0 }