Est. 1987 Intermediate

GNU CPP

The GNU C Preprocessor — the macro-expansion and file-inclusion front-end shipped with GCC since 1987, used by C, C++, Objective-C, and as a general-purpose text preprocessor.

Created by Richard Stallman (and subsequent GCC contributors)

Paradigm Macro / Textual preprocessing
Typing Untyped (token-based text substitution)
First Appeared 1987
Latest Version Bundled with GCC 15 (2025)

GNU CPP is the GNU C Preprocessor, the macro-expansion, file-inclusion, and conditional-compilation front-end that has shipped with the GNU Compiler Collection since GCC’s first public release in March 1987. It implements the preprocessing phase defined by the ISO C standard — handling directives such as #include, #define, #if, #ifdef, #pragma, #error, and #line — and is invoked, usually invisibly, on every C, C++, Objective-C, Objective-C++, and .S assembly file compiled by GCC. The same engine is also available as a standalone program named cpp, which is regularly used as a general-purpose textual preprocessor for files that have nothing to do with C.

The preprocessing language itself — the directives and macro-substitution rules — is defined by the ISO C standard. This page is specifically about the GNU implementation of that language, its history inside GCC, the extensions it adds, and the role it plays in the GNU toolchain.

Origins

When Richard Stallman released the first version of GCC on 22 March 1987, the compiler needed a preprocessor. The traditional Unix approach was to ship /lib/cpp as a separate executable that the compiler driver (cc) would fork/exec for each input file, piping its output into the compiler proper. The early GNU C compiler followed this pattern: a standalone program, originally called cccp (the GNU C-Compatible Compiler Preprocessor), was bundled alongside cc1. Stallman wrote it as a free-software replacement for the AT&T /lib/cpp, since a fully free Unix-like toolchain could not ship a proprietary preprocessor.

For more than a decade GNU CPP remained a separate process. This was easy to reason about — you could literally run cpp foo.c and see exactly what the compiler would see — but it imposed real performance costs: every translation unit paid for an extra process fork, an inter-process pipe, and a second pass of lexical analysis (once by cpp, once again by cc1).

The Move to an Integrated Preprocessor

Starting around the GCC 3.0 timeframe (released 2001-06-18), the GCC developers introduced cpplib (later renamed libcpp), a reusable C library that performed preprocessing in-process. The C and C++ front-ends were modified to call into libcpp directly, so that preprocessing and parsing happened in a single process and tokens flowed from preprocessor to parser without ever being re-serialized as text. By GCC 3.4 the integrated path was the default for all C-family front-ends, and the older cccp / tradcpp code paths were retired from the default build.

The standalone cpp program is still shipped — it is now a thin wrapper that invokes the same libcpp engine — and is what you get when you type cpp foo.c at a shell prompt. The behavior is identical to compiler-internal preprocessing, but the output is written to standard out instead of being passed straight into the parser.

What the Preprocessing Language Looks Like

The preprocessor is a separate language layered on top of C. It operates on a token stream — not raw character text — and is defined in terms of translation phases in the ISO C standard. The major directive categories are:

CategoryDirectivesPurpose
Inclusion#includeSplice another file’s contents into the current translation unit
Macro definition#define, #undefIntroduce or remove object-like and function-like macros
Conditional compilation#if, #ifdef, #ifndef, #elif, #else, #endifSelect code based on compile-time predicates
Diagnostics#error, #warningEmit compile-time messages (#warning standardized in C23)
Line control#lineOverride file name and line number for diagnostics
Pragmas#pragma, _Pragma(...)Implementation-defined directives

Inside macro bodies, two operators give the preprocessor expressive power beyond textual substitution:

  • The stringizing operator #x converts a macro argument into a string literal.
  • The token-pasting operator a ## b concatenates two tokens to form a new one.

These two operators, combined with __FILE__, __LINE__, __func__ (technically a compiler construct, not a macro), and __VA_ARGS__ (C99 variadic macros), are the foundation for the heavyweight macro patterns found in projects like the Linux kernel, GLib, and Boost.Preprocessor.

GNU Extensions

Beyond strict ISO C, GNU CPP defines a number of extensions that have become broadly relied-upon:

  • Variadic macros with named rest argument: #define LOG(fmt, args...) printf(fmt, ##args), where the GNU ##args trick swallows a trailing comma when args is empty. (C++20 and C23 later standardized __VA_OPT__ to do this portably.)
  • Computed #include: #include MACRO_NAME where MACRO_NAME expands to a header path — useful for “X-macro” technique variations.
  • #include_next: emits the next matching file in the include path, used heavily by glibc and the fixincludes mechanism to wrap system headers.
  • #warning directive: GNU CPP shipped #warning for decades before C23 made it standard.
  • __has_include / __has_attribute / __has_builtin: feature-detection predicates that pre-date their C23/C++17/C++20 standardization, supported in GNU CPP for portability with Clang.
  • Predefined macros: a large set of __GNUC__, __GNUC_MINOR__, __OPTIMIZE__, __SIZEOF_INT__, target-architecture macros (__x86_64__, __aarch64__), and feature-test macros that downstream code uses to gate GCC-specific paths.
  • Traditional mode: -traditional-cpp selects pre-ANSI K&R-style preprocessing, used by Fortran and a small number of legacy projects that depend on quirks of the old preprocessor (such as macro expansion inside string literals).

Modes and Drivers

The cpp program (and the compiler driver) supports several preprocessing modes that change what is emitted:

FlagEffect
(default)Emit preprocessed token stream with line markers
-EStop after preprocessing; emit to stdout
-PSuppress #-style line markers in output
-dMDump all defined macros instead of the source
-dDPass #define directives through to the output alongside expansion
-M, -MM, -MD, -MMDGenerate Makefile dependency rules — the backbone of make-based incremental rebuilds
-traditional-cppPre-ANSI preprocessing semantics
-fdirectives-onlyParse directives but do not expand macros — used by build accelerators

The -M family of flags is particularly important in practice: virtually every C/C++ Makefile in the GNU ecosystem uses gcc -MMD (or equivalent) to have GCC emit .d dependency files that make then includes. This is the mechanism that makes incremental rebuilds correct in the face of header-file changes.

Beyond C: CPP as a General-Purpose Preprocessor

Because GNU CPP operates on tokens but is forgiving about non-C input, it has long been used as a general-purpose textual preprocessor for files that are not C source at all. Common cases include:

  • Linux kernel device-tree source (.dts, .dtsi): the kernel build system runs these through cpp so that authors can use #include and #define in their hardware descriptions.
  • GNU Assembler sources with the .S extension: the GCC driver automatically invokes the preprocessor before the assembler, so assembly programmers can #include "asm-offsets.h" and use C-style macros for register names.
  • Fortran sources with uppercase .F / .F90 extensions: gfortran preprocesses these through GNU CPP (typically in traditional mode) before handing them to the Fortran parser.
  • Linker scripts: many embedded projects keep a single linker.ld.in that is run through cpp -P to produce architecture-specific final linker scripts.
  • imake configuration: the X Window System historically used CPP as the macro engine that expanded platform .cf files into Makefile form.

This off-label use is widespread enough that the GNU CPP manual explicitly documents which features can be relied upon when preprocessing non-C input and which cannot.

Architecture

The modern libcpp is a C library that consumes a stream of source characters and produces a stream of preprocessing tokens. Internally it is organized around:

  1. Lexer: converts source characters into preprocessing tokens, tracking source locations precisely (a single location_t value encodes file, line, column, and macro-expansion context).
  2. Macro table: a hash table of defined identifiers mapping to macro definitions (object-like, function-like, variadic, or built-in).
  3. Expansion engine: implements the ISO C “painted-blue” rescanning algorithm, which prevents a macro from re-expanding itself recursively while still allowing other macros in its replacement list to expand.
  4. Conditional stack: tracks the state of nested #if/#ifdef groups, including whether a branch has already been taken at a given level.
  5. Include manager: implements include-path search, #include_next, and the include-once optimization that detects #ifndef GUARD ... #define GUARD ... #endif patterns and skips repeated includes without re-reading the file.

The preserved precise source-location information is what allows GCC’s diagnostics to point at the exact column inside an expanded macro and to print “in expansion of macro X” backtraces — a feature added incrementally over many releases and now considered essential to debugging template-heavy or macro-heavy C++ code.

Standardization Influence

GNU CPP has been the leading reference implementation for several preprocessor features that later became standard:

  • Variadic macros appeared as a GNU extension first and were standardized in C99.
  • #warning existed in GNU CPP for many years before C23 standardized it.
  • __has_include and __has_attribute were introduced by Clang and quickly adopted by GNU CPP; both were later standardized by C++17/C++20 and C23.
  • __VA_OPT__ (C++20 / C23) was specified in coordination with the GCC implementation, which had long supported the older , ##__VA_ARGS__ extension for the same use case.

Conversely, GNU CPP has implemented preprocessor features added by each revision of the C standard — C89, C99, C11, C17/C18, and the major preprocessor additions in C23 (#embed, #elifdef/#elifndef, standardized #warning).

Current Status

GNU CPP is actively maintained as part of GCC. Each GCC release through GCC 15 (2025) brings continued work on C23 conformance, especially around #embed resource-bound diagnostics, additional warnings for fragile macro usage, and source-location refinements for better error messages. The library form (libcpp) is also linked into other parts of the GNU toolchain — for example, the gdb debugger uses it to expand macros during expression evaluation when debugging code compiled with -g3.

In practical terms, GNU CPP is one of the most heavily-exercised pieces of free software in existence: every line of C, C++, or Objective-C compiled by GCC passes through it. A full Linux-kernel build reportedly preprocesses tens of thousands of translation units, each generating its own preprocessor token stream.

Why GNU CPP Matters

  • It is the preprocessor for the dominant free C toolchain. Anything compiled by GCC — the Linux kernel, glibc, most of the GNU userspace, large parts of the BSD userspace ported to Linux, countless embedded firmwares — is preprocessed by GNU CPP.
  • It made the preprocessor free software. Like the rest of GCC, GNU CPP exists because the GNU Project needed a free replacement for AT&T’s /lib/cpp.
  • It is a general-purpose macro language by accident. Used standalone, cpp is one of the most widely-deployed text preprocessors on Earth, quietly expanding device-tree files, linker scripts, and assembly sources in addition to its day job in C compilation.
  • It shaped what “the C preprocessor” means in practice. The behaviors that real C and C++ programmers rely on — __GNUC__-gated extensions, #include_next, Make-style dependency generation, variadic macro tricks — were defined by GNU CPP’s implementation choices long before any standard caught up.

For a forty-year-old piece of software whose job is to substitute one piece of text for another, GNU CPP remains a surprisingly active and load-bearing component of the global software stack.

Timeline

1987
Richard Stallman releases GCC 1.0 on 22 March 1987, including a GNU implementation of the C preprocessor as a separate executable (`cpp`) invoked by the compiler driver
1989
ANSI C (X3.159-1989, later ISO/IEC 9899:1990) standardizes the C preprocessor's directives, macro-expansion rules, stringizing (`#`) and token-pasting (`##`) operators; GNU CPP tracks the standard while retaining GNU extensions
1999
ISO C99 adds variadic macros (`...` and `__VA_ARGS__`), the `_Pragma` operator, and `//` line comments; GNU CPP implements these in subsequent GCC releases
2001
GCC 3.0 (released 2001-06-18) introduces an integrated preprocessor library, `libcpp` (formerly `cpplib`), so that preprocessing happens in-process inside the compiler rather than in a separate `cpp` executable piped into `cc1`; the standalone `cpp` program is retained as a wrapper for backward compatibility
2004
GCC 3.4 finishes the transition to the integrated preprocessor for all C-family front-ends, eliminating the old `cccp`/`tradcpp` code paths from the default build
2011
ISO C11 published; GNU CPP adds support for new predefined macros such as `__STDC_VERSION__ == 201112L` and the `_Pragma` use cases tied to C11 features
2018
ISO/IEC 9899:2018 (commonly called C17 or C18) is published as a bug-fix revision of C11; GNU CPP updates `__STDC_VERSION__` accordingly
2024
ISO C23 (ISO/IEC 9899:2024) is published, introducing `#embed`, `#elifdef`/`#elifndef`, `__has_include`, `__has_c_attribute`, and `#warning` as a standard directive; GNU CPP implements these features in GCC 14 and GCC 15
2025
GCC 15 ships with continued refinement of C23 preprocessor support and additional diagnostics around macro redefinition and `#embed` resource limits

Notable Uses & Legacy

Linux Kernel Build System

Every translation unit of the Linux kernel passes through GNU CPP before reaching the compiler proper. The kernel's `Kconfig`-driven `autoconf.h`, the architecture-selection macros (`CONFIG_X86_64`, `__KERNEL__`), and the heavily-used `container_of` and `likely`/`unlikely` macros all rely on GNU CPP's macro engine and GNU extensions such as statement expressions.

GNU Autoconf / `configure` Scripts

Autoconf-generated `configure` scripts and the `config.h` header they emit use CPP-style `#define` macros for feature detection. The generated headers are consumed by GNU CPP at compile time to select code paths via `#ifdef HAVE_*` — a pattern that underpins the portability of essentially every GNU package.

GNU Assembler Source Files

Assembly files named `.S` (capital S) are passed through GNU CPP by the GCC driver before being handed to the assembler, allowing `#include`, `#define`, and conditional compilation in assembly source. This is heavily used by the Linux kernel, glibc, and bootloader projects.

Fortran and Other Non-C Languages

GCC's Fortran front-end (gfortran) preprocesses files with `.F`, `.F90`, `.fpp` and similar extensions through GNU CPP using the `-traditional-cpp` mode; many scientific Fortran codebases rely on CPP `#ifdef` blocks to select MPI, OpenMP, or platform-specific code paths.

imake and X11 Configuration

The historical X Window System build tool `imake` used the C preprocessor (often GNU CPP on Linux) to expand platform-specific `.cf` configuration files into Makefiles, treating CPP as a general-purpose macro processor entirely separate from C compilation.

Standalone Macro Preprocessing

`cpp` is frequently invoked directly on non-C files — linker scripts, device-tree source (`.dts`/`.dtsi` in the Linux kernel and U-Boot), and assorted configuration templates — to take advantage of `#include` and `#define` without needing a separate macro language.

Language Influence

Influenced By

Original C preprocessor (Bell Labs) M4

Influenced

Clang preprocessor MCPP

Running Today

Run examples using the official Docker image:

docker pull gcc:latest

Example usage:

docker run --rm -v $(pwd):/app -w /app gcc:latest cpp input.c
Last updated: