Using Variables With awk

by mike on November 17, 2010

awk supports user defined variables as well as variables that are predefined.  These variables do not need to be declared like they do in bash scripts.  There are three types of variables:

1. System Variables
2. Scalars
3. Arrays

System or Built-in Variables
System variables are upper case and case sensitive.

NR: number of input lines
The NR variable stores each record, when it is read it is incremented by 1 as you see in the example.

awk ‘{print NR}’ processes

NF: number of fields
Each record has fields which are separated by whitespace.  These fields can vary depending upon the record.

awk ‘{print NF}’ processes

In this example the first and second fields are printed with the total number of fields available for each record.

awk ‘{print $1,$2, NF }’ /var/log/messages.1
Jul 24 14
Jul 24 11
Jul 24 8
Jul 24 12

FILENAME: name of input file

The script sets the filename as the file currently being processed.  You cannot print the variable FILENAME in the BEGIN section of an awk script because it has not been declared until the BODY by default.

FNR: used with multiple input files

This variable will determine which input file is being processed.

FS: field separator character

The field separator is blank space or tabs by default but can be changed by using “-F” followed by the separator, in this example “:”.  So awk searches for the text string jane and prints the first field “$1” and then prints her Group ID which is the forth field.  The fields in /etc/passwd are separated by “:”.

tail /etc/passwd | awk -F: ‘/jane/{print $1, “Group: “$4}’
jane Group: 502

If you wanted to create a line that would allow for multiple file separators you could create a regular expression that would look for a space or a colon or a tab.  Note it is enclosed in single quotes.

tail /etc/passwd | awk -F’[ :\t]‘ ‘/jane/{print $1, “Group: “$4}’

jane Group: 502

OFS: output filed separator
The OFS separates output by default with a space.  That space is used when you place commas between the fields as you see in the example below.

tail /etc/passwd | awk -F’[ :\t]‘ ‘/jane/{print $1,$2,$3,$4,$6,$7}’
jane x 502 502 /home/jane /bin/bash

If you take out the commas in the fields you will get the output without spaces as you see in the example.
tail /etc/passwd | awk -F’[ :\t]‘ ‘/jane/{print $1$2$3$4$6$7}’

ORS: output record separator
Each line is considered a record and is terminated at the end of the line.  The record separator (line separator) defaults to a new line.

OFMT: format for numeric output
This variable allows you to control the format of the number.  The default format is “%.6g” that means 6 significant numbers to the right of the decimal are printed.

RS: record separator
Typically the record separator is a new line.

$0 Variable
The entire file is referenced by $0.

awk ‘{print $0}’ processes
root         1  0.0  0.1  10348   720 ?        Ss   22:01   0:01 init [3]
root         2  0.0  0.0      0     0 ?        S<   22:01   0:00 [migration/0]
root         3  0.0  0.0      0     0 ?        SN   22:01   0:00 [ksoftirqd/0]
root         4  0.0  0.0      0     0 ?        S<   22:01   0:00 [watchdog/0]

Scalar Variables
Scalar variables can be numeric or text.

var = “test_string”

Array Variables
Array variables will have a name and include brackets with a number.



Paul Ferris November 17, 2010 at 2:15 pm

Nice intro to AWK — but the description below can be somewhat misleading:

$0 Variable
The entire file is referenced by $0.
awk ‘{print $0}’ processes

In reality, $0 is the entire line of the input stream (file or data flowing through awk) — and the ‘{print $0}’ is executed for every line, so it results in the equivalent of what looks like a dump of the entire file.

In reality, the data flowing through awk in your example is being processed on a line by line basis. To illustrate this:

‘{print $0}
{ print NR }’

And you will see every line followed by its record indicator.

Awk is a power-saw for developers and administrators alike. Not a day goes by that I don’t use it in my quest to manage enterprise systems.

X November 17, 2010 at 5:08 pm

It is SCALAR variable NOT scaler variable.

mike November 18, 2010 at 11:50 am

Thanks…it is corrected.

Dan November 17, 2010 at 10:20 pm

A good article, thank you.

I didn’t know about the FILENAME, FNR or OFMT.

Just a few corrections:
It’s scalar, not scaler.

The FNR is the per file line number.

The ORS defaults to a new line, not a carriage return on U*X. On Windows it depends on the port you are using: cygwin in bash sell is new line, UnxTools it’s carriage return followed by new line.


mike November 18, 2010 at 11:57 am

Thanks for the input. I have reflected the corrections in the file.

Comments on this entry are closed.

Previous post:

Next post: