Intermediate

Functions in AWK

Learn how to define and use functions in AWK - parameters, return values, variable scope, recursion, and built-in functions with Docker-ready examples

AWK is built around the pattern-action model, but as programs grow you need a way to package reusable logic. AWK gives you two kinds of functions: a rich set of built-in functions for string and math work, and user-defined functions you write yourself. Together they let you keep pattern-action rules small and readable while pushing the real work into named, testable pieces.

User-defined functions arrived in the 1985 revision of AWK and have been part of every implementation since - including the lightweight BusyBox AWK that ships in Alpine Linux. They look familiar if you know C, but AWK adds one famous quirk: there is no local keyword. Instead, extra parameters serve as local variables, a convention that surprises newcomers but quickly becomes second nature.

In this tutorial you’ll learn how to define functions, pass parameters and return values, manage the boundary between local and global scope, write recursive functions, and combine your own functions with AWK’s built-ins inside real record processing.

Defining and Calling Functions

A user-defined function uses the function keyword, a name, a parameter list, and a body in braces. The return statement sends a value back to the caller. Functions can be defined anywhere in the program - before or after the rules that use them.

Create a file named functions.awk:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# User-defined functions in AWK
function greet(name) {
    return "Hello, " name "!"
}

function max(a, b) {
    return (a > b) ? a : b
}

function rectangle_area(width, height) {
    return width * height
}

BEGIN {
    print greet("AWK")
    print "Max of 7 and 12:", max(7, 12)
    print "Area of 4 x 5:", rectangle_area(4, 5)
}

A few things to notice:

  • greet builds its result with string concatenation - in AWK you just place values adjacent to each other, no + operator.
  • max uses the ternary operator ?: as a compact if/else.
  • There is no return-type declaration. AWK is dynamically and weakly typed, so a function can return a string in one call and a number in another.

Tip: When calling a user-defined function, there must be no space between the function name and the opening parenthesis (max(7, 12), not max (7, 12)). A space can confuse AWK into reading it as string concatenation.

Local vs Global Scope

This is the single most important thing to understand about AWK functions. Any variable that is not in the parameter list is global - shared by every function and every rule. To get a local variable, you list it as an extra parameter that the caller simply never passes. By convention these “local” parameters are separated from the real ones with extra spaces.

Create a file named scope.awk:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Variable scope in AWK
# Globals are visible everywhere; parameters are local to the function.
# Convention: extra "parameters" (after the real ones) act as local variables.

function add_tax(amount,    rate, result) {
    rate = 0.08          # local - listed as an extra parameter
    result = amount + amount * rate
    return result
}

function bump_total() {
    total += 10          # global - not a parameter, shared across calls
}

BEGIN {
    total = 0
    bump_total()
    bump_total()
    print "Total (global):", total
    print "Price with tax:", add_tax(100)
    print "rate outside function:", (rate == "" ? "undefined" : rate)
}

Here total is global, so the two calls to bump_total accumulate to 20. Inside add_tax, the variable rate is a local parameter - it never escapes the function, which is why printing rate in BEGIN shows it as undefined (an unset AWK variable is the empty string "").

Convention: The extra spaces before local parameters (amount, rate, result) are purely cosmetic, but they signal intent: everything after the gap is a working variable, not something the caller supplies.

Recursion

Because each call gets its own copies of its parameters, AWK functions can call themselves. Recursion is the natural way to express problems like factorials and Fibonacci numbers.

Create a file named recursion.awk:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# Recursion in AWK
function factorial(n) {
    if (n <= 1)
        return 1
    return n * factorial(n - 1)
}

function fib(n) {
    if (n < 2)
        return n
    return fib(n - 1) + fib(n - 2)
}

BEGIN {
    print "5! =", factorial(5)
    print "10! =", factorial(10)
    print "First 10 Fibonacci numbers:"
    for (i = 0; i < 10; i++)
        printf "%d ", fib(i)
    print ""
}

factorial multiplies on the way back up the call stack; fib branches into two recursive calls. The final print "" emits a newline after the printf loop, which deliberately leaves no newline of its own.

Built-in Functions

AWK ships with a large library of built-in functions so you rarely need to reinvent common operations. You can freely mix them with your own functions.

Create a file named builtins.awk:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Built-in functions in AWK
BEGIN {
    text = "Code Archaeology"

    # String functions
    print "Length:", length(text)
    print "Uppercase:", toupper(text)
    print "Lowercase:", tolower(text)
    print "Substring:", substr(text, 1, 4)

    # split() breaks a string into an array, returning the count
    n = split(text, words, " ")
    print "Word count:", n
    print "First word:", words[1]

    # Math functions
    print "Square root of 144:", sqrt(144)
    print "Truncated 7.9:", int(7.9)

    # sprintf() returns a formatted string (like printf, but as a value)
    label = sprintf("[%05d]", 42)
    print "Formatted ID:", label
}

Key built-ins shown here:

  • length(s), substr(s, start, len), toupper/tolower - core string tools.
  • split(s, arr, sep) - fills arr and returns the number of pieces. Arrays are passed by reference, so words is populated for you.
  • sqrt and int - a sample of the math library (others include sin, cos, exp, log, rand).
  • sprintf - the same formatting as printf, but it returns the string instead of printing it.

Functions in Record Processing

Functions shine when they’re called from pattern-action rules, keeping the rules themselves short. This example reads a file of names and scores, classifies each one, and reports a class average - building an array as it goes and passing it to a function in the END block.

Create a file named scores.txt:

1
2
3
4
Alice 92
Bob 78
Carol 85
Dave 64

Create a file named scores.awk:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Combining user functions with AWK's record processing
function grade(score) {
    if (score >= 90) return "A"
    else if (score >= 80) return "B"
    else if (score >= 70) return "C"
    else return "F"
}

function average(arr, n,    i, sum) {
    for (i = 1; i <= n; i++)
        sum += arr[i]
    return sum / n
}

{
    scores[NR] = $2
    printf "%s: %d (%s)\n", $1, $2, grade($2)
}

END {
    printf "Class average: %.1f\n", average(scores, NR)
}

The main rule runs once per input line: it stores the score using the record number NR as the array key, then prints the name and letter grade. After all input is consumed, the END block passes the whole scores array (by reference) and the record count to average. Note how i and sum are local parameters of average, so they don’t pollute the global namespace.

Running with Docker

Run each example with the Alpine image, which includes BusyBox AWK:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Pull the official image
docker pull alpine:latest

# Basic functions
docker run --rm -v $(pwd):/app -w /app alpine:latest awk -f functions.awk

# Local vs global scope
docker run --rm -v $(pwd):/app -w /app alpine:latest awk -f scope.awk

# Recursion
docker run --rm -v $(pwd):/app -w /app alpine:latest awk -f recursion.awk

# Built-in functions
docker run --rm -v $(pwd):/app -w /app alpine:latest awk -f builtins.awk

# Functions with record processing (reads scores.txt)
docker run --rm -v $(pwd):/app -w /app alpine:latest awk -f scores.awk scores.txt

Expected Output

Running functions.awk:

Hello, AWK!
Max of 7 and 12: 12
Area of 4 x 5: 20

Running scope.awk:

Total (global): 20
Price with tax: 108
rate outside function: undefined

Running recursion.awk:

5! = 120
10! = 3628800
First 10 Fibonacci numbers:
0 1 1 2 3 5 8 13 21 34 

Running builtins.awk:

Length: 16
Uppercase: CODE ARCHAEOLOGY
Lowercase: code archaeology
Substring: Code
Word count: 2
First word: Code
Square root of 144: 12
Truncated 7.9: 7
Formatted ID: [00042]

Running scores.awk with scores.txt:

Alice: 92 (A)
Bob: 78 (C)
Carol: 85 (B)
Dave: 64 (F)
Class average: 79.8

Key Concepts

  • function name(params) { ... } defines a function; call it with no space before the parenthesis.
  • There is no local keyword - extra parameters act as local variables, conventionally set off with extra spaces.
  • Any variable not in the parameter list is global, shared across all functions and rules.
  • return sends a value back; without it, a function returns the empty string / zero.
  • Recursion works because each call gets its own copies of its parameters.
  • Arrays are passed by reference, so functions like split and your own helpers can fill or read them in place.
  • Built-in functions (length, substr, split, toupper, sqrt, sprintf, …) cover most common string and math tasks - reach for them before writing your own.
  • Functions keep pattern-action rules small, moving complex logic out of the implicit main loop.

Running Today

All examples can be run using Docker:

docker pull alpine:latest
Last updated:

Comments

Loading comments...

Leave a Comment

2000 characters remaining