Grok
A pattern-matching mini-language that layers named, composable regular expressions over unstructured text — most widely used for parsing log lines in Logstash and the broader Elastic Stack.
Created by Jordan Sissel
Grok is a small pattern-matching language for extracting structured fields from unstructured text. A grok pattern looks like a regular expression with friendlier, named building blocks: instead of writing \d{1,3}(?:\.\d{1,3}){3} to match an IPv4 address, you write %{IP:client} and let the runtime substitute the underlying regex and capture the result into a field called client. Grok’s primary home today is the Logstash grok filter and the Elasticsearch / OpenSearch grok ingest processor, where it is the standard tool for turning lines like 127.0.0.1 - alice [10/Oct/2026:13:55:36 -0700] "GET /index.html HTTP/1.1" 200 2326 into JSON documents with typed fields.
Strictly speaking, grok is not a general-purpose programming language: it is a declarative pattern syntax that compiles down to PCRE-style regular expressions. But that small syntax, paired with a curated library of named patterns, has been adopted so widely across the log-processing ecosystem that “grok” is now shorthand for an entire approach to log parsing.
History and Origins
Grok was created by Jordan Sissel, a long-time systems engineer who at the time was building tools for handling large volumes of operational text data. The original implementation — a C library and a small command-line utility, hosted on Google Code and later mirrored to GitHub as jordansissel/grok — began in approximately 2006. Its goal was modest: provide a way to express the kinds of regular expressions that systems administrators were repeatedly writing for syslog, Apache logs, mail logs, and similar formats, without re-typing the same primitives every time.
The project gained mainstream traction when Sissel embedded the same pattern language into Logstash, the open-source log pipeline he began releasing publicly around 2009–2010. Logstash’s grok filter let users describe the shape of a log line by composing named patterns from a bundled library, and it quickly became the de facto way to “parse anything” in the emerging ELK (Elasticsearch, Logstash, Kibana) stack. After Elastic acquired Logstash in 2013, grok’s pattern library was formalized as part of the Elastic Stack and continued to spread.
In Elasticsearch 5.0 (2016), Elastic introduced a grok ingest processor, allowing nodes in the cluster to apply grok patterns during indexing — no Logstash required. The same idea later appeared in Elastic Agent and Filebeat processors at the edge, and in Datadog’s Grok Parser for its log-management product. When the Elasticsearch community fork OpenSearch appeared, the grok ingest processor was preserved intact, cementing grok as a cross-ecosystem standard.
Design Philosophy
Grok’s design rests on a few pragmatic ideas:
- Regexes are powerful but unreadable. A working regex for an Apache log line can be dozens of characters of escaped punctuation. Grok hides that behind named pieces (
%{IP},%{HTTPDATE},%{NUMBER}) that read more like a schema than a regex. - Patterns compose. Bigger patterns are built from smaller ones.
%{COMMONAPACHELOG}is itself defined in terms of%{IPORHOST},%{USER},%{HTTPDATE},%{WORD},%{URIPATHPARAM},%{NUMBER}, and so on. New patterns can be defined in the same syntax. - Capture is naming. The act of matching a pattern is, by default, also the act of producing a named field in the output. There is no separate “extract” step.
- Fail loudly, parse forgivingly. When a line does not match, grok tags it (
_grokparsefailurein Logstash) rather than silently dropping data, so operators can iterate on their patterns. - A shared pattern library is the real artifact. The value of grok is less the language and more the cumulative library of named patterns —
SYSLOGBASE,COMBINEDAPACHELOG,IPORHOST,UUID,MAC, and dozens more — that have been refined by years of community use.
The Grok Pattern Syntax
A grok pattern is a string that mixes literal text with %{...} references to other named patterns:
%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}
The general form of a reference is:
%{PATTERN_NAME:field_name:type}
PATTERN_NAME— the name of a pattern in the active pattern library (e.g.IP,NUMBER,WORD,TIMESTAMP_ISO8601).field_name— optional; if present, the matched substring is captured into a field of this name in the resulting structured document.type— optional; coerces the captured string into a richer type. The most common coercions areintandfloat.
A common Apache combined log line is so common that the library defines a single composite pattern for it:
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
}
You can also define new patterns inline. For example, a custom application log line:
[2026-05-20T13:55:36.123Z] WARN AuthService - user=alice action=login result=denied reason="bad password"
might be parsed by:
\[%{TIMESTAMP_ISO8601:ts}\] %{LOGLEVEL:level}\s+%{WORD:service} - %{GREEDYDATA:kv}
and a follow-on kv filter could split the trailing key-value pairs into individual fields.
Key Features
- A built-in pattern library. The canonical
grok-patternsfile defines primitives such asWORD,NOTSPACE,INT,NUMBER,BASE10NUM,IP,IPV4,IPV6,HOSTNAME,IPORHOST,MAC,UUID,EMAILADDRESS,URIHOST,URIPATHPARAM,HTTPDATE,TIMESTAMP_ISO8601, and many more. - Composite patterns. Higher-level patterns like
COMMONAPACHELOG,COMBINEDAPACHELOG,SYSLOGBASE, andSYSLOGTIMESTAMPcapture entire industry-standard log formats in a single token. - Custom pattern files. Users can supply their own
patterns_dircontaining files ofPATTERN_NAME regexdefinitions, mixing them freely with the built-in library. - Field-level type coercion. Numeric fields can be coerced to
intorfloatat extraction time so downstream consumers don’t need to re-parse strings. - Multiple matches. Most implementations accept an array of candidate patterns and pick the first one that matches, making it easy to fan out a single filter across several related log formats.
- Diagnostic tagging. When no pattern matches, the line is typically tagged (e.g.
_grokparsefailure) so it can be routed for human inspection rather than dropped.
Evolution
Although the syntax of grok has been remarkably stable, its implementations have multiplied:
- The original C library (
jordansissel/grok) provided a command-line tool and an embeddable library. It is dormant today but still compiles and remains the historical reference. - Logstash’s
grokfilter was originally a Ruby reimplementation of the same ideas. It is the implementation most users have actually run. - Elasticsearch / OpenSearch ingest processors brought grok into the JVM, running pattern matching inside the search cluster as part of an ingest pipeline. These implementations are based on a Java regex engine and impose certain performance safeguards.
- Edge processors in Elastic Agent and Filebeat let small grok expressions run at the source, reducing the load on central pipelines.
- Vendor-specific grok parsers (notably Datadog’s) extend the syntax with product-specific helpers — for example, more elaborate key-value or duration parsers — while remaining largely compatible with the original pattern conventions.
The result is that “grok” is no longer a single program; it is a small, stable contract — %{PATTERN:field:type} over a shared library — that many tools speak.
Current Relevance
Grok is alive and ubiquitous wherever logs are parsed. Inside the Elastic and OpenSearch ecosystems it is essentially unavoidable: every guide to ingesting Apache, Nginx, syslog, MySQL slow-query, or PostgreSQL logs into a search cluster passes through one or two grok filters or ingest processors. Outside that ecosystem, the syntax has been adopted or adapted by enough other vendors that knowing the canonical pattern names is a portable skill in observability and SRE work.
There are well-known caveats. Grok patterns are still regular expressions under the hood, and poorly written ones can be catastrophically slow on adversarial input — runaway backtracking is a real operational risk. Newer log-parsing approaches (structured logging in JSON, the Elastic Common Schema, OpenTelemetry log records) try to sidestep grok by emitting already-structured data at the source. But the long tail of legacy and third-party applications keeps emitting text logs, and grok continues to be the bridge between those streams and modern analytics back-ends.
Why It Matters
Grok is a small but instructive piece of language design. It demonstrates that a domain-specific language does not need a new parser, a new VM, or even much new syntax to have an outsized impact: a thin layer of naming and composition over regular expressions, paired with a curated library of patterns, was enough to define how a generation of operations engineers thinks about parsing logs.
For programmers who have only ever seen grok in a Logstash config, it is worth remembering that the language itself is a few productions long. Its real artifact is the shared vocabulary — COMMONAPACHELOG, SYSLOGBASE, TIMESTAMP_ISO8601, IPORHOST — that thousands of people have refined and reused. That vocabulary, more than any one implementation, is what makes grok worth studying as a piece of code archaeology.
Timeline
Notable Uses & Legacy
Logstash (Elastic Stack)
The `grok` filter is one of Logstash's most-used filters and is the canonical way to turn arbitrary log lines into structured documents before they are shipped to Elasticsearch, OpenSearch, or other sinks.
Elasticsearch / OpenSearch ingest pipelines
Both Elasticsearch and OpenSearch ship a built-in `grok` ingest processor so that pattern parsing can happen on the cluster, without requiring a separate Logstash node in the pipeline.
Datadog Log Management
Datadog's Grok Parser uses grok-compatible syntax to extract attributes from logs at ingest, including helpers for parsing IP addresses, durations, and structured key-value pairs.
Elastic Agent and Filebeat processors
Elastic's lightweight log shippers expose grok-based processors so that simple field extraction can happen at the edge, before logs are forwarded to a central pipeline.
SIEM and observability vendors
A range of log-analytics and SIEM products document grok-style pattern syntax for parsing custom application and infrastructure logs, leveraging users' existing familiarity with the Logstash patterns.