Est. 2006 Intermediate

Grok

A pattern-matching mini-language that layers named, composable regular expressions over unstructured text — most widely used for parsing log lines in Logstash and the broader Elastic Stack.

Created by Jordan Sissel

Paradigm Pattern matching, Declarative (regex composition)
Typing Untyped text patterns with optional typed field coercion (int, float)
First Appeared approximately 2006
Latest Version Maintained as part of Logstash and Elasticsearch ingest pipelines; the original stand-alone `jordansissel/grok` C library is dormant

Grok is a small pattern-matching language for extracting structured fields from unstructured text. A grok pattern looks like a regular expression with friendlier, named building blocks: instead of writing \d{1,3}(?:\.\d{1,3}){3} to match an IPv4 address, you write %{IP:client} and let the runtime substitute the underlying regex and capture the result into a field called client. Grok’s primary home today is the Logstash grok filter and the Elasticsearch / OpenSearch grok ingest processor, where it is the standard tool for turning lines like 127.0.0.1 - alice [10/Oct/2026:13:55:36 -0700] "GET /index.html HTTP/1.1" 200 2326 into JSON documents with typed fields.

Strictly speaking, grok is not a general-purpose programming language: it is a declarative pattern syntax that compiles down to PCRE-style regular expressions. But that small syntax, paired with a curated library of named patterns, has been adopted so widely across the log-processing ecosystem that “grok” is now shorthand for an entire approach to log parsing.

History and Origins

Grok was created by Jordan Sissel, a long-time systems engineer who at the time was building tools for handling large volumes of operational text data. The original implementation — a C library and a small command-line utility, hosted on Google Code and later mirrored to GitHub as jordansissel/grok — began in approximately 2006. Its goal was modest: provide a way to express the kinds of regular expressions that systems administrators were repeatedly writing for syslog, Apache logs, mail logs, and similar formats, without re-typing the same primitives every time.

The project gained mainstream traction when Sissel embedded the same pattern language into Logstash, the open-source log pipeline he began releasing publicly around 2009–2010. Logstash’s grok filter let users describe the shape of a log line by composing named patterns from a bundled library, and it quickly became the de facto way to “parse anything” in the emerging ELK (Elasticsearch, Logstash, Kibana) stack. After Elastic acquired Logstash in 2013, grok’s pattern library was formalized as part of the Elastic Stack and continued to spread.

In Elasticsearch 5.0 (2016), Elastic introduced a grok ingest processor, allowing nodes in the cluster to apply grok patterns during indexing — no Logstash required. The same idea later appeared in Elastic Agent and Filebeat processors at the edge, and in Datadog’s Grok Parser for its log-management product. When the Elasticsearch community fork OpenSearch appeared, the grok ingest processor was preserved intact, cementing grok as a cross-ecosystem standard.

Design Philosophy

Grok’s design rests on a few pragmatic ideas:

  • Regexes are powerful but unreadable. A working regex for an Apache log line can be dozens of characters of escaped punctuation. Grok hides that behind named pieces (%{IP}, %{HTTPDATE}, %{NUMBER}) that read more like a schema than a regex.
  • Patterns compose. Bigger patterns are built from smaller ones. %{COMMONAPACHELOG} is itself defined in terms of %{IPORHOST}, %{USER}, %{HTTPDATE}, %{WORD}, %{URIPATHPARAM}, %{NUMBER}, and so on. New patterns can be defined in the same syntax.
  • Capture is naming. The act of matching a pattern is, by default, also the act of producing a named field in the output. There is no separate “extract” step.
  • Fail loudly, parse forgivingly. When a line does not match, grok tags it (_grokparsefailure in Logstash) rather than silently dropping data, so operators can iterate on their patterns.
  • A shared pattern library is the real artifact. The value of grok is less the language and more the cumulative library of named patterns — SYSLOGBASE, COMBINEDAPACHELOG, IPORHOST, UUID, MAC, and dozens more — that have been refined by years of community use.

The Grok Pattern Syntax

A grok pattern is a string that mixes literal text with %{...} references to other named patterns:

%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}

The general form of a reference is:

%{PATTERN_NAME:field_name:type}
  • PATTERN_NAME — the name of a pattern in the active pattern library (e.g. IP, NUMBER, WORD, TIMESTAMP_ISO8601).
  • field_name — optional; if present, the matched substring is captured into a field of this name in the resulting structured document.
  • type — optional; coerces the captured string into a richer type. The most common coercions are int and float.

A common Apache combined log line is so common that the library defines a single composite pattern for it:

filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
  }
}

You can also define new patterns inline. For example, a custom application log line:

[2026-05-20T13:55:36.123Z] WARN  AuthService - user=alice action=login result=denied reason="bad password"

might be parsed by:

\[%{TIMESTAMP_ISO8601:ts}\] %{LOGLEVEL:level}\s+%{WORD:service} - %{GREEDYDATA:kv}

and a follow-on kv filter could split the trailing key-value pairs into individual fields.

Key Features

  • A built-in pattern library. The canonical grok-patterns file defines primitives such as WORD, NOTSPACE, INT, NUMBER, BASE10NUM, IP, IPV4, IPV6, HOSTNAME, IPORHOST, MAC, UUID, EMAILADDRESS, URIHOST, URIPATHPARAM, HTTPDATE, TIMESTAMP_ISO8601, and many more.
  • Composite patterns. Higher-level patterns like COMMONAPACHELOG, COMBINEDAPACHELOG, SYSLOGBASE, and SYSLOGTIMESTAMP capture entire industry-standard log formats in a single token.
  • Custom pattern files. Users can supply their own patterns_dir containing files of PATTERN_NAME regex definitions, mixing them freely with the built-in library.
  • Field-level type coercion. Numeric fields can be coerced to int or float at extraction time so downstream consumers don’t need to re-parse strings.
  • Multiple matches. Most implementations accept an array of candidate patterns and pick the first one that matches, making it easy to fan out a single filter across several related log formats.
  • Diagnostic tagging. When no pattern matches, the line is typically tagged (e.g. _grokparsefailure) so it can be routed for human inspection rather than dropped.

Evolution

Although the syntax of grok has been remarkably stable, its implementations have multiplied:

  • The original C library (jordansissel/grok) provided a command-line tool and an embeddable library. It is dormant today but still compiles and remains the historical reference.
  • Logstash’s grok filter was originally a Ruby reimplementation of the same ideas. It is the implementation most users have actually run.
  • Elasticsearch / OpenSearch ingest processors brought grok into the JVM, running pattern matching inside the search cluster as part of an ingest pipeline. These implementations are based on a Java regex engine and impose certain performance safeguards.
  • Edge processors in Elastic Agent and Filebeat let small grok expressions run at the source, reducing the load on central pipelines.
  • Vendor-specific grok parsers (notably Datadog’s) extend the syntax with product-specific helpers — for example, more elaborate key-value or duration parsers — while remaining largely compatible with the original pattern conventions.

The result is that “grok” is no longer a single program; it is a small, stable contract — %{PATTERN:field:type} over a shared library — that many tools speak.

Current Relevance

Grok is alive and ubiquitous wherever logs are parsed. Inside the Elastic and OpenSearch ecosystems it is essentially unavoidable: every guide to ingesting Apache, Nginx, syslog, MySQL slow-query, or PostgreSQL logs into a search cluster passes through one or two grok filters or ingest processors. Outside that ecosystem, the syntax has been adopted or adapted by enough other vendors that knowing the canonical pattern names is a portable skill in observability and SRE work.

There are well-known caveats. Grok patterns are still regular expressions under the hood, and poorly written ones can be catastrophically slow on adversarial input — runaway backtracking is a real operational risk. Newer log-parsing approaches (structured logging in JSON, the Elastic Common Schema, OpenTelemetry log records) try to sidestep grok by emitting already-structured data at the source. But the long tail of legacy and third-party applications keeps emitting text logs, and grok continues to be the bridge between those streams and modern analytics back-ends.

Why It Matters

Grok is a small but instructive piece of language design. It demonstrates that a domain-specific language does not need a new parser, a new VM, or even much new syntax to have an outsized impact: a thin layer of naming and composition over regular expressions, paired with a curated library of patterns, was enough to define how a generation of operations engineers thinks about parsing logs.

For programmers who have only ever seen grok in a Logstash config, it is worth remembering that the language itself is a few productions long. Its real artifact is the shared vocabulary — COMMONAPACHELOG, SYSLOGBASE, TIMESTAMP_ISO8601, IPORHOST — that thousands of people have refined and reused. That vocabulary, more than any one implementation, is what makes grok worth studying as a piece of code archaeology.

Timeline

2006
Jordan Sissel begins work on `grok`, a C library and command-line tool for matching text with named, composable regular-expression patterns; the project is hosted on Google Code
2010
Jordan Sissel releases Logstash, an open-source log-shipping and processing pipeline; the grok pattern language is bundled as the `grok` filter for parsing unstructured log lines into structured fields
2013
Elasticsearch BV (now Elastic) acquires Logstash, bringing grok into what becomes the official Elastic Stack (ELK) alongside Elasticsearch and Kibana
2016
Elasticsearch 5.0 introduces the `grok` ingest processor, letting Elasticsearch nodes apply grok patterns directly during indexing rather than only inside Logstash
2017
Datadog reportedly ships its Grok Parser around this time as part of its log-management product, adopting a grok-compatible pattern syntax for parsing logs at ingest time
2021
OpenSearch — the community fork of Elasticsearch — preserves the grok ingest processor, ensuring the pattern language continues to be a first-class log-parsing tool in both the Elastic and OpenSearch ecosystems
2023
Elastic continues evolving the grok ecosystem inside Elastic Agent and Ingest Node processors; the canonical pattern library (the `grok-patterns` file with `COMMONAPACHELOG`, `SYSLOGBASE`, `IP`, `NUMBER`, and similar named patterns) remains the de facto standard reference

Notable Uses & Legacy

Logstash (Elastic Stack)

The `grok` filter is one of Logstash's most-used filters and is the canonical way to turn arbitrary log lines into structured documents before they are shipped to Elasticsearch, OpenSearch, or other sinks.

Elasticsearch / OpenSearch ingest pipelines

Both Elasticsearch and OpenSearch ship a built-in `grok` ingest processor so that pattern parsing can happen on the cluster, without requiring a separate Logstash node in the pipeline.

Datadog Log Management

Datadog's Grok Parser uses grok-compatible syntax to extract attributes from logs at ingest, including helpers for parsing IP addresses, durations, and structured key-value pairs.

Elastic Agent and Filebeat processors

Elastic's lightweight log shippers expose grok-based processors so that simple field extraction can happen at the edge, before logs are forwarded to a central pipeline.

SIEM and observability vendors

A range of log-analytics and SIEM products document grok-style pattern syntax for parsing custom application and infrastructure logs, leveraging users' existing familiarity with the Logstash patterns.

Language Influence

Influenced By

Regular expressions PCRE Perl Named capture groups

Influenced

Logstash grok filter Elasticsearch grok ingest processor Datadog Grok Parser OpenSearch ingest grok processor Filebeat / Elastic Agent processors

Running Today

Run examples using the official Docker image:

docker pull
Last updated: