M4
The general-purpose macro processor from Bell Labs that quietly powers Autoconf, Sendmail, and decades of Unix text generation
Created by Brian Kernighan, Dennis Ritchie
M4 is a general-purpose macro processor: a text-substitution tool that reads input, expands any macros it finds, and writes the result. Created at Bell Labs in 1977 by Brian Kernighan and Dennis Ritchie, it has outlived nearly all of its contemporaries by becoming invisible infrastructure - the engine quietly powering Autoconf, Sendmail configuration, and countless code-generation pipelines across the Unix world.
History & Origins
M4 belongs to a lineage of macro processors that stretches back to the early 1960s. In 1965, Christopher Strachey described GPM (the General Purpose Macrogenerator), a strikingly compact design that fit into around 250 machine instructions and demonstrated that a small, language-independent macro engine could be genuinely powerful. At Bell Labs, Doug McIlroy had earlier explored conditional and recursive macros and the idea of using macros to define other macros - ideas that would become central to m4.
The more immediate predecessors were Bell Labs’ own tools. Around 1972, Andrew D. Hall built M6, a general-purpose macro processor written in roughly 600 Fortran statements. Around this same period (the GNU manual links it to the 1976 Software Tools), Dennis Ritchie wrote M3, a macro processor for the AP-3 minicomputer. The following year, in 1977, Kernighan and Ritchie combined and refined these ideas into M4 - a cleaner, more capable processor that originally shipped with only 21 builtin macros.
M4 found an early home as the macro engine for Ratfor (Rational Fortran), the structured-programming front end for Fortran popularized in Kernighan and Plauger’s Software Tools. From there it spread with Unix itself, eventually becoming a component of the POSIX standard and a fixture on virtually every Unix-like system.
Design Philosophy
M4’s design reflects a few deliberate choices that set it apart from the assembler-style macro facilities that came before it:
- Free-form syntax. Unlike line-oriented assembly macro systems, m4 treats its input as a free-flowing stream of tokens. Macros can appear anywhere text appears.
- Language independence. M4 knows nothing about the language it is processing. The same tool can generate C, Fortran, shell scripts, HTML, or plain prose.
- Rescanning and recursion. When a macro expands, its replacement text is pushed back onto the input and scanned again. This rescanning is what makes m4 recursive - and, in practice, Turing-complete.
This combination makes m4 enormously flexible, but it is also the source of its reputation for being hard to master. Because everything is text and expansion happens in waves, subtle bugs in quoting can produce surprising output that is genuinely difficult to debug.
Core Concepts
Defining and Calling Macros
The foundational builtin is define, which associates a name with replacement text. Once defined, the name is replaced wherever it appears:
define(`NAME', `World')dnl
Hello, NAME!
Output:
Hello, World!
Here dnl (“delete to newline”) discards the rest of the line, a common idiom for keeping definition lines out of the output.
Quoting
M4’s most distinctive - and most stumbled-over - feature is its quoting. The default quote characters are the backtick ` to open and the apostrophe ' to close:
define(`greeting', `hello')
greeting # expands to: hello
`greeting' # stays literally: greeting
Quoting tells m4 not to expand something yet. Mastering when to quote - and how many layers of quoting to use - is the central skill in writing correct m4. Because the quote characters are unusual, m4 code can be reconfigured with changequote to avoid clashes with the target language.
Macro Arguments
Macros can take arguments, referenced inside the definition as $1, $2, and so on, with $0 as the macro name and $# as the argument count:
define(`greet', `Hello, $1! You are visitor number $2.')dnl
greet(`Ada', `42')
Output:
Hello, Ada! You are visitor number 42.
Builtin Macros
Beyond define, m4 ships with builtins for the common needs of code generation:
| Builtin | Purpose |
|---|---|
define / undefine | Create or remove a macro |
ifdef / ifelse | Conditional expansion |
incr / decr | Increment / decrement an integer |
eval | Evaluate an integer arithmetic expression |
include / sinclude | Insert the contents of a file |
dnl | Discard text through the next newline |
divert / undivert | Send output to temporary streams and recombine it |
len / substr / index | String inspection and manipulation |
translit / patsubst | Character and pattern-based substitution |
dumpdef / traceon | Debugging aids |
Conditionals and Recursion
ifelse provides conditional logic, and because expansion is recursive, m4 can express loops. A classic example is a counter that calls itself:
define(`countdown', `ifelse($1, `0', `done', `$1 countdown(decr($1))')')dnl
countdown(`5')
Output:
5 4 3 2 1 done
This recursive style - guarded by ifelse and driven by incr/decr - is how serious m4 programs build tables, generate repetitive code, and implement surprisingly elaborate logic.
Evolution
For its first decade-plus, m4 lived as a traditional Unix utility, carried along with each vendor’s Unix and constrained by fixed internal limits on buffer sizes and the number of macros.
That changed with the GNU M4 project. In 1990, René Seindal released the first version of GNU M4, removing those arbitrary limits and adding extensions. François Pinard took over maintenance and released GNU M4 1.4 in 1994, a stable release that remained the reference for roughly a decade. Later maintainers - including Paul Eggert, Gary V. Vaughan, and Eric Blake - shepherded the 1.4.x series through a long run of portability and correctness fixes. The most recent stable release, GNU M4 1.4.20, arrived on 10 May 2025, collecting several years of portability improvements along with a couple of minor performance optimizations. A more ambitious GNU M4 2.0, featuring dynamic module loading, has long been under development.
Today multiple independent implementations coexist: the GNU version, the BSD m4 found in FreeBSD, NetBSD, and OpenBSD, and the Heirloom Project’s traditional m4, among others.
Current Relevance
M4 is rarely something developers set out to learn, yet most touch it indirectly every day. Its single most important modern role is as the foundation of GNU Autoconf: the configure.ac files that describe how to build a portable program are written in m4 macros, which Autoconf expands into the familiar ./configure shell scripts. Anyone who has built software from source has run the output of an m4 program.
Beyond Autoconf, m4 remains the configuration engine for Sendmail, the macro layer for the SELinux Reference Policy, a footprint generator in the gEDA electronic design suite, and a handy tool for ad-hoc text templating wherever a dependency-free, ubiquitous preprocessor is wanted. Because it ships with essentially every Unix-like system and is part of POSIX, it is always there.
Why It Matters
M4 is a study in how a small, sharply focused tool can become permanent infrastructure. It embodies the Unix philosophy as fully as grep, sed, or awk: it does one thing - text expansion - and does it in a way general enough to outlast the specific problems it was built for.
Its influence is felt most directly through Autoconf, whose m4sugar and m4sh layers effectively turn m4 into a higher-level configuration language used across the free-software world. More broadly, m4 stands as a canonical example of macro processing and recursive text substitution, the kind of language often studied precisely because its power and its pitfalls are so closely intertwined. Nearly five decades after Kernighan and Ritchie wrote it, m4 remains a quiet, indispensable part of how software gets built.
Learning Resources
- GNU M4 Manual - https://www.gnu.org/software/m4/manual/
- The M4 Macro Processor by Brian W. Kernighan and Dennis M. Ritchie (the original Bell Labs paper)
- POSIX m4 specification - part of the IEEE Std 1003.1 utilities
M4 rewards patience: once its quoting model clicks, it becomes a remarkably capable tool for generating text of any kind - and a window into a foundational idea in the history of programming languages.
Timeline
Notable Uses & Legacy
GNU Autoconf
M4 is the macro engine behind Autoconf - configure.ac files are written in m4 (via the m4sugar and m4sh layers) and expanded into portable shell configure scripts.
Sendmail
Sendmail's notoriously complex sendmail.cf configuration is generated from concise .mc files using a large library of m4 macros.
Ratfor (Rational Fortran)
M4 served as the original macro engine for Ratfor, the structured-programming preprocessor for Fortran from the Software Tools era.
SELinux Reference Policy
The SELinux Reference Policy uses m4 extensively to expand reusable interface macros into concrete security policy rules.
gEDA / PCB
The gEDA electronic design toolsuite uses m4 to generate parameterized PCB component footprints from macro definitions.
Language Influence
Influenced By
Influenced
Running Today
Run examples using the official Docker image:
docker pull alpine:latestExample usage:
docker run --rm -v $(pwd):/app -w /app alpine:latest sh -c 'apk add --no-cache m4 && m4 hello.m4'