Beginner

Variables and Types in AWK

Learn how AWK handles variables, data types, built-in variables, and associative arrays with practical Docker-ready examples

AWK has a deceptively simple type system that belies its expressive power. Unlike most languages where you declare variables before using them, AWK variables spring into existence the moment you reference them — initialized to zero or empty string depending on context. This dynamic, context-sensitive typing is one of AWK’s most distinctive characteristics.

As a data-driven, pattern-action language, AWK’s variables are deeply tied to the data being processed. Field variables like $1, $2, and $NF are populated automatically for every input record. Built-in variables like NR (record number) and FS (field separator) control how AWK processes input. User-defined variables accumulate values as rules fire across multiple records. Understanding this trio — field variables, built-in variables, and user variables — unlocks AWK’s full processing model.

In this tutorial you will learn how AWK’s type system works in practice, how to use and configure built-in variables, how to declare and use user variables, and how AWK’s powerful associative arrays act as the language’s primary data structure.

AWK’s Type System: Strings and Numbers

AWK uses dynamic, context-sensitive typing. Every value is simultaneously a string and a number — which interpretation is used depends on the operation performed. There are no type declarations; context determines type.

Create a file named variables_types.awk:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
BEGIN {
    # AWK variables need no declaration
    # Uninitialized variables are 0 (numeric) or "" (string)
    print "Uninitialized numeric:", unset_num + 0
    print "Uninitialized string: [" unset_str "]"

    # Assignment - AWK infers context automatically
    x = 42
    print "Integer:", x
    print "Integer as string length:", length(x)

    pi = 3.14159
    print "Float:", pi

    name = "AWK"
    print "String:", name

    # Numeric string: a string that looks like a number
    num_str = "100"
    print "Numeric string + 1:", num_str + 1
    print "Numeric string . \" items\":", num_str " items"

    # String concatenation: adjacent values (no operator needed)
    version = "AWK" " " "1977"
    print "Concatenated:", version

    # Arithmetic on a string yields 0
    word = "hello"
    print "String in arithmetic:", word + 5
}

Run it:

1
2
docker pull alpine:latest
docker run --rm -v $(pwd):/app -w /app alpine:latest awk -f variables_types.awk

Expected output:

Uninitialized numeric: 0
Uninitialized string: []
Integer: 42
Integer as string length: 2
Float: 3.14159
String: AWK
Numeric string + 1: 101
Numeric string . " items": 100 items
Concatenated: AWK 1977
String in arithmetic: 5

Built-in Variables

AWK provides a rich set of built-in variables that control how input is parsed and output is formatted. These are the backbone of AWK’s data-driven model.

To demonstrate them, first create a data file named employees.txt:

Alice Engineering 95000
Bob Marketing 72000
Carol Engineering 88000
Dave Sales 65000
Eve Engineering 102000

Now create a file named variables_builtin.awk:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
BEGIN {
    # FS: Field Separator (default is whitespace)
    # Set it in BEGIN to affect all records
    FS = " "

    # OFS: Output Field Separator (default is a single space)
    OFS = " | "

    print "=== Employee Report ==="
    print "NR: record number, NF: number of fields per record"
    print ""
}

{
    # NR: current Record Number (line number across all files)
    # NF: Number of Fields in the current record
    # $0: the entire current record (line)
    # $1, $2, $3: individual fields

    printf "Record %d: %d fields -> %s\n", NR, NF, $0

    # OFS is used when you reassign $0 or use print with commas
    print $1, $2, $3
}

END {
    # NR at END holds the total number of records processed
    print ""
    print "Total records processed:", NR
}

Run it:

1
docker run --rm -v $(pwd):/app -w /app alpine:latest awk -f variables_builtin.awk employees.txt

Expected output:

=== Employee Report ===
NR: record number, NF: number of fields per record

Record 1: 3 fields -> Alice Engineering 95000
Alice | Engineering | 95000
Record 2: 3 fields -> Bob Marketing 72000
Bob | Marketing | 72000
Record 3: 3 fields -> Carol Engineering 88000
Carol | Engineering | 88000
Record 4: 3 fields -> Dave Sales 65000
Dave | Sales | 65000
Record 5: 3 fields -> Eve Engineering 102000
Eve | Engineering | 102000

Total records processed: 5

User-Defined Variables and Accumulators

User variables in AWK are global by default and persist across all records processed. This makes them ideal for accumulators — totals, counters, and running values built up as AWK processes each line.

Create a file named variables_accumulate.awk:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
BEGIN {
    # Variables initialized here are available to all rules
    total_salary = 0
    employee_count = 0
    max_salary = 0
    min_salary = 999999999
    dept_header = "Department Summary"
}

# This rule fires for every record
{
    salary = $3 + 0      # Force numeric interpretation
    name = $1            # String variable
    dept = $2

    # Accumulate totals
    total_salary += salary
    employee_count++

    # Track min and max
    if (salary > max_salary) {
        max_salary = salary
        top_earner = name
    }
    if (salary < min_salary) {
        min_salary = salary
        low_earner = name
    }
}

END {
    avg = total_salary / employee_count

    print dept_header
    print "================="
    printf "Employees:      %d\n", employee_count
    printf "Total salary:   $%d\n", total_salary
    printf "Average salary: $%.2f\n", avg
    printf "Highest paid:   %s ($%d)\n", top_earner, max_salary
    printf "Lowest paid:    %s ($%d)\n", low_earner, min_salary
}

Run it:

1
docker run --rm -v $(pwd):/app -w /app alpine:latest awk -f variables_accumulate.awk employees.txt

Expected output:

Department Summary
=================
Employees:      5
Total salary:   $422000
Average salary: $84400.00
Highest paid:   Eve ($102000)
Lowest paid:    Dave ($65000)

Associative Arrays

AWK’s most powerful data structure is the associative array — a hash map indexed by arbitrary string keys. There is no array declaration; elements are created on first access. Arrays are the primary tool for grouping, counting, and aggregating data across records.

Create a file named variables_arrays.awk:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
BEGIN {
    # Arrays need no declaration — elements are created on access
    # Uninitialized array elements are 0 or ""
}

{
    dept = $2
    salary = $3 + 0

    # Count employees per department
    dept_count[dept]++

    # Sum salaries per department
    dept_salary[dept] += salary

    # Track employee names per department (string accumulation)
    if (dept_names[dept] == "")
        dept_names[dept] = $1
    else
        dept_names[dept] = dept_names[dept] ", " $1
}

END {
    print "Department Breakdown"
    print "===================="

    # Iterate over all keys with "for (key in array)"
    for (dept in dept_count) {
        avg = dept_salary[dept] / dept_count[dept]
        printf "\n%s\n", dept
        printf "  Employees: %s\n", dept_names[dept]
        printf "  Count:     %d\n", dept_count[dept]
        printf "  Avg Salary: $%.2f\n", avg
    }

    # Test for array element existence with "key in array"
    if ("Engineering" in dept_count)
        print "\nEngineering department exists"

    # Delete an element
    delete dept_count["Sales"]
    print "After deleting Sales, count keys:", length(dept_count)
}

Run it:

1
docker run --rm -v $(pwd):/app -w /app alpine:latest awk -f variables_arrays.awk employees.txt

Expected output:

Department Breakdown
====================

Engineering
  Employees: Alice, Carol, Eve
  Count:     3
  Avg Salary: $95000.00

Marketing
  Employees: Bob
  Count:     1
  Avg Salary: $72000.00

Sales
  Employees: Dave
  Count:     1
  Avg Salary: $65000.00

Engineering department exists
After deleting Sales, count keys: 2

Note: The order of departments in the output may vary since AWK associative arrays are unordered. The grouping and calculations will always be correct.

Running with Docker

1
2
3
4
5
6
7
8
# Pull the official image
docker pull alpine:latest

# Run any of the examples (substitute the filename as needed)
docker run --rm -v $(pwd):/app -w /app alpine:latest awk -f variables_types.awk
docker run --rm -v $(pwd):/app -w /app alpine:latest awk -f variables_builtin.awk employees.txt
docker run --rm -v $(pwd):/app -w /app alpine:latest awk -f variables_accumulate.awk employees.txt
docker run --rm -v $(pwd):/app -w /app alpine:latest awk -f variables_arrays.awk employees.txt

Key Concepts

  • No declarations needed — AWK variables are created on first use, initialized to 0 or "" depending on whether they are used in a numeric or string context
  • Context-sensitive typing — every value is simultaneously a string and a number; the operation performed determines which interpretation is used
  • String concatenation is juxtaposition — placing two values adjacent to each other (with or without spaces in the source) concatenates them; there is no + or . concatenation operator
  • Built-in variables control parsingFS and RS control how input is split into fields and records; OFS and ORS control how output is assembled
  • NR and NF reflect the current recordNR increments with every record read; NF changes with each record to reflect how many fields are in that specific line
  • User variables are global and persistent — a variable set in one rule is visible in all other rules, including END, making accumulators natural to write
  • Associative arrays are unordered hash maps — indexed by any string key, created on access, iterated with for (key in array), tested with key in array, and elements removed with delete
  • Multi-dimensional arrays — AWK simulates multi-dimensional arrays using concatenated keys: a[row, col] is syntactic sugar for a[row SUBSEP col] where SUBSEP is a special separator character

Running Today

All examples can be run using Docker:

docker pull alpine:latest
Last updated:

Comments

Loading comments...

Leave a Comment

2000 characters remaining