GNU CPP
The GNU C Preprocessor — the macro-expansion and file-inclusion front-end shipped with GCC since 1987, used by C, C++, Objective-C, and as a general-purpose text preprocessor.
Created by Richard Stallman (and subsequent GCC contributors)
GNU CPP is the GNU C Preprocessor, the macro-expansion, file-inclusion, and conditional-compilation front-end that has shipped with the GNU Compiler Collection since GCC’s first public release in March 1987. It implements the preprocessing phase defined by the ISO C standard — handling directives such as #include, #define, #if, #ifdef, #pragma, #error, and #line — and is invoked, usually invisibly, on every C, C++, Objective-C, Objective-C++, and .S assembly file compiled by GCC. The same engine is also available as a standalone program named cpp, which is regularly used as a general-purpose textual preprocessor for files that have nothing to do with C.
The preprocessing language itself — the directives and macro-substitution rules — is defined by the ISO C standard. This page is specifically about the GNU implementation of that language, its history inside GCC, the extensions it adds, and the role it plays in the GNU toolchain.
Origins
When Richard Stallman released the first version of GCC on 22 March 1987, the compiler needed a preprocessor. The traditional Unix approach was to ship /lib/cpp as a separate executable that the compiler driver (cc) would fork/exec for each input file, piping its output into the compiler proper. The early GNU C compiler followed this pattern: a standalone program, originally called cccp (the GNU C-Compatible Compiler Preprocessor), was bundled alongside cc1. Stallman wrote it as a free-software replacement for the AT&T /lib/cpp, since a fully free Unix-like toolchain could not ship a proprietary preprocessor.
For more than a decade GNU CPP remained a separate process. This was easy to reason about — you could literally run cpp foo.c and see exactly what the compiler would see — but it imposed real performance costs: every translation unit paid for an extra process fork, an inter-process pipe, and a second pass of lexical analysis (once by cpp, once again by cc1).
The Move to an Integrated Preprocessor
Starting around the GCC 3.0 timeframe (released 2001-06-18), the GCC developers introduced cpplib (later renamed libcpp), a reusable C library that performed preprocessing in-process. The C and C++ front-ends were modified to call into libcpp directly, so that preprocessing and parsing happened in a single process and tokens flowed from preprocessor to parser without ever being re-serialized as text. By GCC 3.4 the integrated path was the default for all C-family front-ends, and the older cccp / tradcpp code paths were retired from the default build.
The standalone cpp program is still shipped — it is now a thin wrapper that invokes the same libcpp engine — and is what you get when you type cpp foo.c at a shell prompt. The behavior is identical to compiler-internal preprocessing, but the output is written to standard out instead of being passed straight into the parser.
What the Preprocessing Language Looks Like
The preprocessor is a separate language layered on top of C. It operates on a token stream — not raw character text — and is defined in terms of translation phases in the ISO C standard. The major directive categories are:
| Category | Directives | Purpose |
|---|---|---|
| Inclusion | #include | Splice another file’s contents into the current translation unit |
| Macro definition | #define, #undef | Introduce or remove object-like and function-like macros |
| Conditional compilation | #if, #ifdef, #ifndef, #elif, #else, #endif | Select code based on compile-time predicates |
| Diagnostics | #error, #warning | Emit compile-time messages (#warning standardized in C23) |
| Line control | #line | Override file name and line number for diagnostics |
| Pragmas | #pragma, _Pragma(...) | Implementation-defined directives |
Inside macro bodies, two operators give the preprocessor expressive power beyond textual substitution:
- The stringizing operator
#xconverts a macro argument into a string literal. - The token-pasting operator
a ## bconcatenates two tokens to form a new one.
These two operators, combined with __FILE__, __LINE__, __func__ (technically a compiler construct, not a macro), and __VA_ARGS__ (C99 variadic macros), are the foundation for the heavyweight macro patterns found in projects like the Linux kernel, GLib, and Boost.Preprocessor.
GNU Extensions
Beyond strict ISO C, GNU CPP defines a number of extensions that have become broadly relied-upon:
- Variadic macros with named rest argument:
#define LOG(fmt, args...) printf(fmt, ##args), where the GNU##argstrick swallows a trailing comma whenargsis empty. (C++20 and C23 later standardized__VA_OPT__to do this portably.) - Computed
#include:#include MACRO_NAMEwhereMACRO_NAMEexpands to a header path — useful for “X-macro” technique variations. #include_next: emits the next matching file in the include path, used heavily by glibc and the fixincludes mechanism to wrap system headers.#warningdirective: GNU CPP shipped#warningfor decades before C23 made it standard.__has_include/__has_attribute/__has_builtin: feature-detection predicates that pre-date their C23/C++17/C++20 standardization, supported in GNU CPP for portability with Clang.- Predefined macros: a large set of
__GNUC__,__GNUC_MINOR__,__OPTIMIZE__,__SIZEOF_INT__, target-architecture macros (__x86_64__,__aarch64__), and feature-test macros that downstream code uses to gate GCC-specific paths. - Traditional mode:
-traditional-cppselects pre-ANSI K&R-style preprocessing, used by Fortran and a small number of legacy projects that depend on quirks of the old preprocessor (such as macro expansion inside string literals).
Modes and Drivers
The cpp program (and the compiler driver) supports several preprocessing modes that change what is emitted:
| Flag | Effect |
|---|---|
| (default) | Emit preprocessed token stream with line markers |
-E | Stop after preprocessing; emit to stdout |
-P | Suppress #-style line markers in output |
-dM | Dump all defined macros instead of the source |
-dD | Pass #define directives through to the output alongside expansion |
-M, -MM, -MD, -MMD | Generate Makefile dependency rules — the backbone of make-based incremental rebuilds |
-traditional-cpp | Pre-ANSI preprocessing semantics |
-fdirectives-only | Parse directives but do not expand macros — used by build accelerators |
The -M family of flags is particularly important in practice: virtually every C/C++ Makefile in the GNU ecosystem uses gcc -MMD (or equivalent) to have GCC emit .d dependency files that make then includes. This is the mechanism that makes incremental rebuilds correct in the face of header-file changes.
Beyond C: CPP as a General-Purpose Preprocessor
Because GNU CPP operates on tokens but is forgiving about non-C input, it has long been used as a general-purpose textual preprocessor for files that are not C source at all. Common cases include:
- Linux kernel device-tree source (
.dts,.dtsi): the kernel build system runs these throughcppso that authors can use#includeand#definein their hardware descriptions. - GNU Assembler sources with the
.Sextension: the GCC driver automatically invokes the preprocessor before the assembler, so assembly programmers can#include "asm-offsets.h"and use C-style macros for register names. - Fortran sources with uppercase
.F/.F90extensions: gfortran preprocesses these through GNU CPP (typically in traditional mode) before handing them to the Fortran parser. - Linker scripts: many embedded projects keep a single
linker.ld.inthat is run throughcpp -Pto produce architecture-specific final linker scripts. - imake configuration: the X Window System historically used CPP as the macro engine that expanded platform
.cffiles intoMakefileform.
This off-label use is widespread enough that the GNU CPP manual explicitly documents which features can be relied upon when preprocessing non-C input and which cannot.
Architecture
The modern libcpp is a C library that consumes a stream of source characters and produces a stream of preprocessing tokens. Internally it is organized around:
- Lexer: converts source characters into preprocessing tokens, tracking source locations precisely (a single
location_tvalue encodes file, line, column, and macro-expansion context). - Macro table: a hash table of defined identifiers mapping to macro definitions (object-like, function-like, variadic, or built-in).
- Expansion engine: implements the ISO C “painted-blue” rescanning algorithm, which prevents a macro from re-expanding itself recursively while still allowing other macros in its replacement list to expand.
- Conditional stack: tracks the state of nested
#if/#ifdefgroups, including whether a branch has already been taken at a given level. - Include manager: implements include-path search,
#include_next, and the include-once optimization that detects#ifndef GUARD ... #define GUARD ... #endifpatterns and skips repeated includes without re-reading the file.
The preserved precise source-location information is what allows GCC’s diagnostics to point at the exact column inside an expanded macro and to print “in expansion of macro X” backtraces — a feature added incrementally over many releases and now considered essential to debugging template-heavy or macro-heavy C++ code.
Standardization Influence
GNU CPP has been the leading reference implementation for several preprocessor features that later became standard:
- Variadic macros appeared as a GNU extension first and were standardized in C99.
#warningexisted in GNU CPP for many years before C23 standardized it.__has_includeand__has_attributewere introduced by Clang and quickly adopted by GNU CPP; both were later standardized by C++17/C++20 and C23.__VA_OPT__(C++20 / C23) was specified in coordination with the GCC implementation, which had long supported the older, ##__VA_ARGS__extension for the same use case.
Conversely, GNU CPP has implemented preprocessor features added by each revision of the C standard — C89, C99, C11, C17/C18, and the major preprocessor additions in C23 (#embed, #elifdef/#elifndef, standardized #warning).
Current Status
GNU CPP is actively maintained as part of GCC. Each GCC release through GCC 15 (2025) brings continued work on C23 conformance, especially around #embed resource-bound diagnostics, additional warnings for fragile macro usage, and source-location refinements for better error messages. The library form (libcpp) is also linked into other parts of the GNU toolchain — for example, the gdb debugger uses it to expand macros during expression evaluation when debugging code compiled with -g3.
In practical terms, GNU CPP is one of the most heavily-exercised pieces of free software in existence: every line of C, C++, or Objective-C compiled by GCC passes through it. A full Linux-kernel build reportedly preprocesses tens of thousands of translation units, each generating its own preprocessor token stream.
Why GNU CPP Matters
- It is the preprocessor for the dominant free C toolchain. Anything compiled by GCC — the Linux kernel, glibc, most of the GNU userspace, large parts of the BSD userspace ported to Linux, countless embedded firmwares — is preprocessed by GNU CPP.
- It made the preprocessor free software. Like the rest of GCC, GNU CPP exists because the GNU Project needed a free replacement for AT&T’s
/lib/cpp. - It is a general-purpose macro language by accident. Used standalone,
cppis one of the most widely-deployed text preprocessors on Earth, quietly expanding device-tree files, linker scripts, and assembly sources in addition to its day job in C compilation. - It shaped what “the C preprocessor” means in practice. The behaviors that real C and C++ programmers rely on —
__GNUC__-gated extensions,#include_next, Make-style dependency generation, variadic macro tricks — were defined by GNU CPP’s implementation choices long before any standard caught up.
For a forty-year-old piece of software whose job is to substitute one piece of text for another, GNU CPP remains a surprisingly active and load-bearing component of the global software stack.
Timeline
Notable Uses & Legacy
Linux Kernel Build System
Every translation unit of the Linux kernel passes through GNU CPP before reaching the compiler proper. The kernel's `Kconfig`-driven `autoconf.h`, the architecture-selection macros (`CONFIG_X86_64`, `__KERNEL__`), and the heavily-used `container_of` and `likely`/`unlikely` macros all rely on GNU CPP's macro engine and GNU extensions such as statement expressions.
GNU Autoconf / `configure` Scripts
Autoconf-generated `configure` scripts and the `config.h` header they emit use CPP-style `#define` macros for feature detection. The generated headers are consumed by GNU CPP at compile time to select code paths via `#ifdef HAVE_*` — a pattern that underpins the portability of essentially every GNU package.
GNU Assembler Source Files
Assembly files named `.S` (capital S) are passed through GNU CPP by the GCC driver before being handed to the assembler, allowing `#include`, `#define`, and conditional compilation in assembly source. This is heavily used by the Linux kernel, glibc, and bootloader projects.
Fortran and Other Non-C Languages
GCC's Fortran front-end (gfortran) preprocesses files with `.F`, `.F90`, `.fpp` and similar extensions through GNU CPP using the `-traditional-cpp` mode; many scientific Fortran codebases rely on CPP `#ifdef` blocks to select MPI, OpenMP, or platform-specific code paths.
imake and X11 Configuration
The historical X Window System build tool `imake` used the C preprocessor (often GNU CPP on Linux) to expand platform-specific `.cf` configuration files into Makefiles, treating CPP as a general-purpose macro processor entirely separate from C compilation.
Standalone Macro Preprocessing
`cpp` is frequently invoked directly on non-C files — linker scripts, device-tree source (`.dts`/`.dtsi` in the Linux kernel and U-Boot), and assorted configuration templates — to take advantage of `#include` and `#define` without needing a separate macro language.
Language Influence
Influenced By
Influenced
Running Today
Run examples using the official Docker image:
docker pull gcc:latestExample usage:
docker run --rm -v $(pwd):/app -w /app gcc:latest cpp input.c