Grok is grok parsing library based on re2
regexp.
g := grok.New()
// use custom patterns
patternDefinitions := map[string]string{
// patterns can be nested
"NGINX_HOST": `(?:%{IP:destination.ip}|%{NGINX_NOTSEPARATOR:destination.domain})(:%{NUMBER:destination.port})?`,
// NGINX_NOTSEPARATOR is used in NGINX_HOST. IP and NUMBER are part of default pattern set
"NGINX_NOTSEPARATOR": `"[^\t ,:]+"`,
}
g.AddPatterns(patternDefinitions)
// compile grok before use, this will generate regex.Regex based on pattern and
// subpatterns provided.
// this needs to be performed just once.
err := g.Compile("%{NGINX_HOST}", true)
res, err := g.ParseString("127.0.0.1:1234")
results in:
map[string]string {
"destination.ip": "127.0.0.1",
"destination.port": "1234",
}
In this case we changed
err := g.Compile("%{NGINX_HOST}", false)
to
err := g.Compile("%{NGINX_HOST}", true)
allowing unnamed return matches. In case of unnamed match, definition name is used.
g := grok.New()
// use custom patterns
patternDefinitions := map[string]string{
// patterns can be nested
"NGINX_HOST": `(?:%{IP:destination.ip}|%{NGINX_NOTSEPARATOR:destination.domain})(:%{NUMBER:destination.port})?`,
// NGINX_NOTSEPARATOR is used in NGINX_HOST. IP and NUMBER are part of default pattern set
"NGINX_NOTSEPARATOR": `"[^\t ,:]+"`,
}
g.AddPatterns(patternDefinitions)
// compile grok before use, this will generate regex.Regex based on pattern and
// subpatterns provided
err := g.Compile("%{NGINX_HOST}", false)
res, err := g.ParseString("127.0.0.1:1234")
results in:
map[string]string {
"NGINX_HOST": "127.0.0.1:1234",
"destination.ip": "127.0.0.1",
"IPV4": "127.0.0.1",
"destination.port": "1234",
"BASE10NUM": "1234",
}
In this case we're marking destination.port
as int
using definition %{NUMBER:destination.port:int}
.
g := grok.New()
// use custom patterns
patternDefinitions := map[string]string{
"NGINX_HOST": `(?:%{IP:destination.ip}|%{NGINX_NOTSEPARATOR:destination.domain})(:%{NUMBER:destination.port:int})?`,
"NGINX_NOTSEPARATOR": `"[^\t ,:]+"`,
}
g.AddPatterns(patternDefinitions)
// compile grok before use, this will generate regex.Regex based on pattern and
// subpatterns provided
err := g.Compile("%{NGINX_HOST}", true)
res, err := g.ParseTypedString("127.0.0.1:1234")
See type changed from map[string]string
to map[string]interface{}
and destination.port
is now a number:
map[string]interface {} {
"destination.ip": "127.0.0.1",
"destination.port": 1234,
}
Comparing to github.com/vjeantet/grok and more optimized version based on previous one github.com/trivago/grok
BenchmarkParseString-10 15466 76811 ns/op 4578 B/op 5 allocs/op
BenchmarkParseStringRegexp-10 15351 77109 ns/op 3840 B/op 3 allocs/op
BenchmarkParseStringTrivago-10 15868 76416 ns/op 4593 B/op 5 allocs/op
BenchmarkParseStringVjeanet-10 15548 77111 ns/op 5897 B/op 6 allocs/op
BenchmarkNestedParseString-10 42201 28908 ns/op 3463 B/op 4 allocs/op
BenchmarkNestedParseStringTrivago-10 41937 28836 ns/op 3449 B/op 4 allocs/op
BenchmarkNestedParseStringVjeanet-10 41080 29174 ns/op 4045 B/op 5 allocs/op
BenchmarkTypedParseString-10 39934 29707 ns/op 3851 B/op 9 allocs/op
BenchmarkTypedParseStringTrivago-10 40146 29238 ns/op 3475 B/op 6 allocs/op
BenchmarkTypedParseStringVjeanet-10 39931 30616 ns/op 4196 B/op 14 allocs/op
This library comes with a default set of patterns defined in patterns/default.go
file.
You can include more predefined patterns from patterns/*.go
like so
g := grok.New()
g.AddPatterns(patterns.Rails) // to include whole set
g.AddPattern(patterns.Ruby["RUBY_LOGLEVEL"]) // to include specific one
Default set consists of:
Name | Example |
---|---|
WORD | "hello", "world123", "test_data" |
NOTSPACE | "example", "text-with-dashes", "12345" |
SPACE | " ", "\t", " " |
INT | "123", "-456", "+789" |
NUMBER | "123", "456.789", "-0.123" |
BOOL | "true", "false", "true" |
BASE10NUM | "123", "-123.456", "0.789" |
BASE16NUM | "1a2b", "0x1A2B", "-0x1a2b3c" |
BASE16FLOAT | "0x1.a2b3", "-0x1A2B3C.D" |
POSINT | "123", "456", "789" |
NONNEGINT | "0", "123", "456" |
GREEDYDATA | "anything goes", "literally anything", "123 #@!" |
QUOTEDSTRING | ""This is a quote"", "'single quoted'" |
UUID | "123e4567-e89b-12d3-a456-426614174000" |
URN | "urn:isbn:0451450523", "urn:ietf:rfc:2648" |
Name | Example |
---|---|
IP | "192.168.1.1", "2001:0db8:85a3:0000:0000:8a2e:0370:7334" |
IPV6 | "2001:0db8:85a3:0000:0000:8a2e:0370:7334", " |
IPV4 | "192.168.1.1", "10.0.0.1", "172.16.254.1" |
IPORHOST | "example.com", "192.168.1.1", "fe80::1ff:fe23:4567:890a" |
HOSTNAME | "example.com", "sub.domain.co.uk", "localhost" |
EMAILLOCALPART | "john.doe", "alice123", "bob-smith" |
EMAILADDRESS | "[email protected]", "[email protected]" |
USERNAME | "user1", "john.doe", "alice_123" |
USER | "user1", "john.doe", "alice_123" |
MAC | "00:1A:2B:3C:4D:5E", "001A.2B3C.4D5E" |
CISCOMAC | "001A.2B3C.4D5E", "001B.2C3D.4E5F", "001C.2D3E.4F5A" |
WINDOWSMAC | "00-1A-2B-3C-4D-5E", "00-1B-2C-3D-4E-5F" |
COMMONMAC | "00:1A:2B:3C:4D:5E", "00:1B:2C:3D:4E:5F" |
HOSTPORT | "example.com:80", "192.168.1.1:8080" |
Name | Example |
---|---|
UNIXPATH | "/home/user", "/var/log/syslog", "/tmp/abc_123" |
TTY | "/dev/pts/1", "/dev/tty0", "/dev/ttyS0" |
WINPATH | "C:\Program Files\App", "D:\Work\project\file.txt" |
URIPROTO | "http", "https", "ftp" |
URIHOST | "example.com", "192.168.1.1:8080" |
URIPATH | "/path/to/resource", "/another/path", "/root" |
URIQUERY | "key=value", "search=query&active=true" |
URIPARAM | "?key=value", "?search=query&active=true" |
URIPATHPARAM | "/path?query=1", "/folder/path?valid=true" |
PATH | "/home/user/documents", "C:\Windows\system32", "/var/log/syslog" |
Name | Example |
---|---|
MONTH | "January", "Feb", "March", "Apr", "May", "Jun", "Jul", "August", "September", "October", "Nov", "December" |
MONTHNUM | "01", "02", "03", ... "11", "12" |
DAY | "Monday", "Tuesday", ... "Sunday" |
YEAR | "1999", "2000", "2021" |
HOUR | "00", "12", "23" |
MINUTE | "00", "30", "59" |
SECOND | "00", "30", "60" |
TIME | "14:30", "23:59:59", "12:00:00", "12:00:60" |
DATE_US | "04/21/2022", "12-25-2020", "07/04/1999" |
DATE_EU | "21.04.2022", "25/12/2020", "04-07-1999" |
ISO8601_TIMEZONE | "Z", "+02:00", "-05:00" |
ISO8601_SECOND | "59", "30", "60.123" |
TIMESTAMP_ISO8601 | "2022-04-21T14:30:00Z", "2020-12-25T23:59:59+02:00", "1999-07-04T12:00:00-05:00" |
DATE | "04/21/2022", "21.04.2022", "12-25-2020" |
DATESTAMP | "04/21/2022 14:30", "21.04.2022 23:59", "12-25-2020 12:00" |
TZ | "EST", "CET", "PDT" |
DATESTAMP_RFC822 | "Wed Jan 12 2024 14:33 EST" |
DATESTAMP_RFC2822 | "Tue, 12 Jan 2022 14:30 +0200", "Fri, 25 Dec 2020 23:59 -0500", "Sun, 04 Jul 1999 12:00 Z" |
DATESTAMP_OTHER | "Tue Jan 12 14:30 EST 2022", "Fri Dec 25 23:59 CET 2020", "Sun Jul 04 12:00 PDT 1999" |
DATESTAMP_EVENTLOG | "20220421143000", "20201225235959", "19990704120000" |
Name | Example |
---|---|
SYSLOGTIMESTAMP | "Jan 1 00:00:00", "Mar 15 12:34:56", "Dec 31 23:59:59" |
PROG | "sshd", "kernel", "cron" |
SYSLOGPROG | "sshd[1234]", "kernel", "cron[5678]" |
SYSLOGHOST | "example.com", "192.168.1.1", "localhost" |
SYSLOGFACILITY | "<1.2>", "<12345.13456>" |
HTTPDATE | "25/Dec/2024:14:33 4" |