Variables and Types in AWK
Learn how AWK handles variables, data types, built-in variables, and associative arrays with practical Docker-ready examples
AWK has a deceptively simple type system that belies its expressive power. Unlike most languages where you declare variables before using them, AWK variables spring into existence the moment you reference them — initialized to zero or empty string depending on context. This dynamic, context-sensitive typing is one of AWK’s most distinctive characteristics.
As a data-driven, pattern-action language, AWK’s variables are deeply tied to the data being processed. Field variables like $1, $2, and $NF are populated automatically for every input record. Built-in variables like NR (record number) and FS (field separator) control how AWK processes input. User-defined variables accumulate values as rules fire across multiple records. Understanding this trio — field variables, built-in variables, and user variables — unlocks AWK’s full processing model.
In this tutorial you will learn how AWK’s type system works in practice, how to use and configure built-in variables, how to declare and use user variables, and how AWK’s powerful associative arrays act as the language’s primary data structure.
AWK’s Type System: Strings and Numbers
AWK uses dynamic, context-sensitive typing. Every value is simultaneously a string and a number — which interpretation is used depends on the operation performed. There are no type declarations; context determines type.
Create a file named variables_types.awk:
| |
Run it:
| |
Expected output:
Uninitialized numeric: 0
Uninitialized string: []
Integer: 42
Integer as string length: 2
Float: 3.14159
String: AWK
Numeric string + 1: 101
Numeric string . " items": 100 items
Concatenated: AWK 1977
String in arithmetic: 5
Built-in Variables
AWK provides a rich set of built-in variables that control how input is parsed and output is formatted. These are the backbone of AWK’s data-driven model.
To demonstrate them, first create a data file named employees.txt:
Alice Engineering 95000
Bob Marketing 72000
Carol Engineering 88000
Dave Sales 65000
Eve Engineering 102000
Now create a file named variables_builtin.awk:
| |
Run it:
| |
Expected output:
=== Employee Report ===
NR: record number, NF: number of fields per record
Record 1: 3 fields -> Alice Engineering 95000
Alice | Engineering | 95000
Record 2: 3 fields -> Bob Marketing 72000
Bob | Marketing | 72000
Record 3: 3 fields -> Carol Engineering 88000
Carol | Engineering | 88000
Record 4: 3 fields -> Dave Sales 65000
Dave | Sales | 65000
Record 5: 3 fields -> Eve Engineering 102000
Eve | Engineering | 102000
Total records processed: 5
User-Defined Variables and Accumulators
User variables in AWK are global by default and persist across all records processed. This makes them ideal for accumulators — totals, counters, and running values built up as AWK processes each line.
Create a file named variables_accumulate.awk:
| |
Run it:
| |
Expected output:
Department Summary
=================
Employees: 5
Total salary: $422000
Average salary: $84400.00
Highest paid: Eve ($102000)
Lowest paid: Dave ($65000)
Associative Arrays
AWK’s most powerful data structure is the associative array — a hash map indexed by arbitrary string keys. There is no array declaration; elements are created on first access. Arrays are the primary tool for grouping, counting, and aggregating data across records.
Create a file named variables_arrays.awk:
| |
Run it:
| |
Expected output:
Department Breakdown
====================
Engineering
Employees: Alice, Carol, Eve
Count: 3
Avg Salary: $95000.00
Marketing
Employees: Bob
Count: 1
Avg Salary: $72000.00
Sales
Employees: Dave
Count: 1
Avg Salary: $65000.00
Engineering department exists
After deleting Sales, count keys: 2
Note: The order of departments in the output may vary since AWK associative arrays are unordered. The grouping and calculations will always be correct.
Running with Docker
| |
Key Concepts
- No declarations needed — AWK variables are created on first use, initialized to
0or""depending on whether they are used in a numeric or string context - Context-sensitive typing — every value is simultaneously a string and a number; the operation performed determines which interpretation is used
- String concatenation is juxtaposition — placing two values adjacent to each other (with or without spaces in the source) concatenates them; there is no
+or.concatenation operator - Built-in variables control parsing —
FSandRScontrol how input is split into fields and records;OFSandORScontrol how output is assembled NRandNFreflect the current record —NRincrements with every record read;NFchanges with each record to reflect how many fields are in that specific line- User variables are global and persistent — a variable set in one rule is visible in all other rules, including
END, making accumulators natural to write - Associative arrays are unordered hash maps — indexed by any string key, created on access, iterated with
for (key in array), tested withkey in array, and elements removed withdelete - Multi-dimensional arrays — AWK simulates multi-dimensional arrays using concatenated keys:
a[row, col]is syntactic sugar fora[row SUBSEP col]whereSUBSEPis a special separator character
Comments
Loading comments...
Leave a Comment