xelogstash

Send SQL Server Extended Events to Logstash, Elastic Search, or JSON

OTHER License

Stars
26

XELogstash

xelogstash.exe is deprecated. Please use the newer xewriter.exe. It is only included for backward compatibility.

xelogstash.exe is a command-line application to pull SQL Server Extended Events and SQL Server Agent job results and push them to Logstash. It supports SQL Server 2012 and higher. It is untested against SQL Server 2008 (R2).

  1. Breaking Changes
  2. Getting started
  3. Sources and Defaults
  4. Controlling JSON
  5. Adds, Moves, Copies
  6. Prefixes and keeping your place
  7. Application Settings
  8. Derived Fields
  9. Sinks
  10. Other Notes

Breaking Changes

  • post-0.44 - The AppLog section has been removed. Please remove this from your TOML files. All application log files are now written to JSON.
  • post-0.44 - The application log files are written to the log subdirectory in JSON format. These should be processed with FileBeat.
  • 0.40 - The TOML configuration to write to logstash has moved. It was formerly under app.logstash. Now it is configured in the logstash section. See Sinks below.

Getting Started

The application uses a TOML file for configuration. Two sample files are included.

  1. Extract the ZIP contents to a directory. We'll be starting with "start.toml".
  2. If you have a local SQL Server installed, no changes are necessary.
  3. Otherwise, edit the fqdn under [[source]] to point to a SQL Server
  4. From a command-prompt, type "xelogstash.exe". (This doesn't send anything to logstash yet)

This should generate a samples.xe.json and an xestate folder. The samples.xe.json file is one of each type of event that would have been sent to Logstash. This gives you a chance to see how things will look. The xestate folder is used to keep track of the read position in the XE session.

NOTE: The permissions on the xestate directory are limited. When switching to a service account, be prepared to reset the permissions or delete the directory and allow the service account to recreate it.

Writing events to a file

In xelogstash.toml, uncomment the two lines of the filesink section and rerun the application. This will write events to a file in the events directory in JSON format. Each source server XE session gets a file and they rotate every hour. These files can be written to Elastic Search using FileBeat.

Sending to Logstash

To send events to directly Logstash, specify the logstash section with a host value. The host should be in host:port format. After that you can run the executable with the same parameters.

[logstash]
host = "localhost:8888"

Command Line Options

There are three command line options. Running xelogstash /? will display the help for the options.

  • /config filename - Tries to load the TOML file from filename. If this isn't specified, xelogstash.toml is used.
  • /log - Captures the application output to a log file INSTEAD of standard out. The log files are located in the log subdirectory and named xelogstash_YYYYMMDD.log based on the start time. Log files are automatically deleted after 7 days and that's not configurable yet.
  • /debug - Enables additional debugging output.

Sources and Defaults

Each SQL Server you want to extract events from is called a "source". You can specify each source using the [[sourece]] header in the TOML file. (This is an array of sources.) The only required field for a source is the fqdn which is how you connect to the server.

The [defaults] section applies to all sources (but can be overridden by the source). I find it's easiest to set most of your values in this section and then override them for each source. The "default" section is just a source itself. Note: The source values override the default values except for adds, copies and moves which are discussed below. Those are merged together.

You can set the following Source fields (or Default fields)

  • fqdn is the name to connect to for the server. It can be a host name, a hostname,port, a host\instance name, a static DNS, or an IP. This value is just dropped into the connection string.
  • sessions is a list of sessions to process.
  • ignore_sessions says to not process any sessions for this source. This is mainly useful if you have a list of default sessions but some old SQL Server 2008 boxes that you want to ignore the sessions completely so you can just get the failed agent jobs.
  • rows is how many events to try and process per session. It will read this many events and then continue reading until the offset changes. Omitting this value or setting it to zero will process all rows since it last ran.
  • agentjobs can be "all", "failed" or "none". It tries to map the field names to the extended event field names.
  • excludedEvents is a list of events to ignore. Both sample configuration files exclude some of the system health events like ring buffer recorded and diagnostic component results.
  • adds, moves, and copies are described in their own section below.
  • strip_crlf (boolean) will replace common newline patterns with a space. Some logstash configurations don't handle newlines in their JSON. The downside is that it de-formats SQL and deadlock fields.
  • start_at and stop_at are used to limit the date range of returned events. Please be aware this will almost certainly lead to dropped or duplicated events. It should only be used for testing. The date must be in "2018-01-01T13:14:15Z" or "2018-06-01T12:00:00-05:00" and must be enclosed in quotes in the TOML file.
  • look_back (duration string) will determine how far back to get events. It's like a relative start_at. look_back is a duration string of a decimal number with a unit suffix, such as "24h", "168h" (1 week), or "60m". Valid time units are "h", "m", or "s". If both start_at and look_back are set, it will use the most recent calculated date between the two.
  • exclude_17830 is a boolean that will exclude 17830 errors. I typically see these from packaged software and can't do much about them.
  • log_bad_xml is boolean. This will write the last bad XML parse to a file.
  • include_dbghelpdll_msg is a boolean. Some versions of SQL Server emit a message like Using 'dbghelp.dll' version '4.0.5'. These are now excluded by default. This setting adds those back in.

Controlling the JSON

The two fields timestamp_field_name and payload_field_name are available in the Source and Default sections. The following examples best illustrate how they work.

Example 1

All the event fields are at the root level.

- - - xelogstash.toml - - - 

timestamp_field_name = "@timestamp"
payload_field_name = ""

 - - - generates - - - - - 

{
  "@timestamp": "2018-05-08T01:23:45.691Z",
  "client_app_name": "xecap.exe",
  "client_hostname": "D30",
  ". . . lots of json fields...": "go here",
  "xe_severity_value": 6
}

Example 2

All the event fields are nested inside an "mssql" field.

timestamp_field_name = "event_time"
payload_field_name = "mssql"
----------------------
{
  "event_time": "2018-05-08T01:23:49.928Z",
  "mssql": {
    "client_app_name": "xecap.exe",
    "client_hostname": "D30",
    ". . . lots of json fields...": "go here",
    "xe_severity_value": 6
  }
}

Add, Copies, and Moves

Adds, moves, and copies give you the opptunity to modify the generated JSON. All three are arrays with a format of "string1:string2".

Note: For these values, any Source overwrites the Default at the individual key level. If both the default and source try to add a key for "environment", it will use the value from the Source. But if the Default adds a key for "datacenter" and the Source adds a key for "environment", it will add both keys. Copies and Moves are handled the same way.

  • adds are "key:value" that will add a key with the specified value
  • copies are "src:dest" that will copy the value at src to dest
  • moves are "src:dest" that will move the value from src to dest

Further, the keys can be nested using a dotted notation. For example, setting a key at global.host.name will nest the value three levels deep.

Consider the following settings:

timestamp_field_name = "@timestamp"
payload_field_name = "event"

adds =  [   "global.log.vendor:Microsoft",
            "global.log.type:Application",
            "global.log.collector.application:xelogstash.exe",
            "global.log.collector.version:'$(VERSION)'",
        ] 

copies = [  "event.mssql_computer:global.host.name",
            "event.mssql_domain:global.host.domain",
            "event.mssql_version:global.log.version"
        ]
----------------------------
{
  "global": {
    "host": {
      "domain": "WORKGROUP",
      "name": "D30"
    },
    "log": {
      "version": "SQL Server 2016 RTM",
      "collector": {
        "version": "0.12",
        "application": "xelogstash.exe"
      },
      "type": "Application",
      "vendor": "Microsoft"
    }
  },
  "@timestamp": "2018-05-08T01:24:47.368Z",
  "event": {
    "client_app_name": "xecap.exe",
    "client_hostname": "D30",
    
    "mssql_computer": "D30",
    "mssql_domain": "WORKGROUP",
    "mssql_server_name": "D30\\SQL2016",
    "mssql_version": "SQL Server 2016 RTM",
    "name": "login",
    
    "server_instance_name": "D30\\SQL2016",
    "server_principal_name": "graz",
    "timestamp": "2018-05-08T01:24:47.368Z",
    "xe_acct_app": "graz - xecap.exe",
    
    "xe_session_name": "Demo-Logins2",
    "xe_severity_keyword": "info",
    "xe_severity_value": 6
  }
}

Note how the copies are able to "lift" values out of the event and into other parts of the JSON document. This helps conform to whatever standards are in your ELK environment.

The values that are added can be strings, integers, floats, booleans, or dates. Putting a value in single ticks forces it to be a string.

Replacements

The adds, moves, and copies also support a few "replacement" values.

  • $(VERSION) is the version of xelogstash.exe. Note that $(VERSION) is forced to a string by enclosing it in single ticks.
  • $(GITDESCRIBE) is Git Describe from the build.
  • $(EXENAMEPATH) is the full path and name of the executable
  • $(EXENAME) is the name of the executable
  • $(PID) is the Process ID of xelogstash.exe
  • $(HOST) is the computer where xelogstash.exe is running
  • $(NOW) is the time that xelogstash.exe wrote this value to a sink

See the section below on derived fields for a description of the "mssql_" and "xe_" fields

Prefixes and keeping your place

NOTE: Starting in v0.20, the /status directory is being moved to the /xestate field. This should be done for you by the application. The new file name format is domain_instance_class_identifier.state. Please leave the Prefix in the TOML file for now. It will be removed in a future release.

The application keeps track how far it has read into the extended event file target using a state file. This file holds the file name and offset of each read for that session. The file is named Domain_ServerName_Session.state. There is also a ".0" file that is used while the application is running. You can tell the application to start all over by deleting the state file. The "ServerName" above is populated by @@SERVERNAME from the instance.

Application Settings

These are the fields you can set in the [app] section of the configuration file.

[app] section

This controls the overall application. All these fields are optional.

  • logstash is the address of the Logstash server is host:port format. If empty, it will not send events to logstash.
  • samples set to true will save a JSON file with one of each event type that was processed. This is very helpful for testing your JSON format.
  • summary set to true will print a nice summary of the output including how many of each type of event were processed.
  • workers controls how many concurrent workers will process the sources. It defaults to the number of cores in the computer. A given worker will process all the sessions for a source before moving on to the next source. The application doesn't use much CPU. It spends lots of time waiting on data to return from sources.
  • http_metrics enables a local web server that can provide diagnostic information. This defaults to false. It exposes the following two URLs:
    • http://localhost:8080/debug/vars provides some basic metrics in JSON format including the total number of events processed. This information is real-time.
    • http://localhost:8080/debug/pprof/ exposes the GO PPROF web page for diagnostic information on the executable including memory usage, blocking, and running GO routines.
    • IE is horrible for viewing these. I've found that using PowerShell and running Invoke-RestMethod "http://localhost:8080/debug/vars" works well to view that URL

[applog] section

Note: This section was removed post 0.44.

Derived Fields

Based on a particular event, the application computes a number of calculated fields and adds those to the event. Most of them have an "xe_" prefix to separate them. It also returns a few SQL Server level settings with an "mssql_" prfix.

  • mssql_domain: This is the result of DEFAULT_DOMAIN() run on the source server
  • mssql_computer: This is the result of SERVERPROPERTY('MachineName') on the source computer.
  • mssql_server_name: This is the result of @@SERVERNAME on the source computer
  • mssql_version: Is something like "SQL Server 2016 SP1" composed of various server property attributes.
  • mssql_product_version: is SERVERPROPERTY('ProductVersion') which is major.minor.build.revision (example: 13.0.5366.1)
  • xe_severity_value: 3, 5, or 6 for ERR, WARN, INFO based on the syslog values
  • xe_severity_keyword: "err", "warning", "info" based on the syslog values
  • xe_description: a text description of the event. The format
    depends on the type of event and what fields are available. My goal is that seeing the name of the server, the event type (name field), and this field are enough to know what happened
  • xe_acct_app: a combination of the server_principal_name and client_app_name in "acct - app" format.
  • xe_acct_app_client: a combination of the server_principal_name, client_app_name, and client_hostname in "acct - app (client)" format
  • xe_session_name: name of the extended event session for this event
  • xe_file_name: name of the XE file for this event
  • xe_file_offset: file offset where we found this event
  • server_instance_name: This is normally provided by the extended event. However system_health and AlwaysOn_health don't capture this. If the field isn't provided, I populate it from @@SERVERNAME of the source server.

Sinks

xelogstash can write to multiple targets called "sinks". It can write to files, to logstash, or directly to Elastic Search. It can write to all three sinks at the same time if they are all specified. They are written serially so the performanc isn't that great.

File Sink

This is configured with a filesink section:

[filesink]
retain_hours = 1 

The files are named for the server, instance, session name, date, and hour and are written to an events directory with a .json extension. The files are rotated every hour. Any files older than retain_hours are removed.

These files should be imported into Logstash or Elastic Search using FileBeat. This is probably the simplest way to import.

Logstash Sink

This is configured with a logstash section:

[logstash]
host = "localhost:8888"

This writes the events directly to the specified logstash server.

Elastic Sink

This is configured using the elastic section. This is the most complicated to configure.

[elastic]
addresses = ["https://host.domain.com:1234"]
username = "dev-user"
password = "horsebatterysomethingsomething"

auto_create_indexes = true
default_index = "dev-sql"

event_index_map = [
    "login:dev-login"
]
  • addressess specifies one or more addresses for the Elastic servers.
  • username and password provide authentication.
  • auto_create_indexes controls whether the application tries to create indexes. It creates default indexes.
  • default_index is the index where events will be written unless overridden by the event index map.
  • event_index_map allows mapping different events to different indexs. In the example above, all the events except login will go to the dev-sql index. The login events will go to the dev-login index. I often split login event into their own index.

Other Notes

  1. I've had issues with the SQL Server Agent job ending but not stopping the executable when I manually stop the job. The application now sets a lock file so that a second instance will exit with an error. The lock file name is based on the TOML file name (TOML_file_name.lock). Find the first instance in Task Manager and kill it. I usually only see this if I stop it in the middle of a run or Logstash is behaving oddly.

  2. I find that setting the rows = 20000 in the [defaults] section works well. It's enough rows that it catches up quickly if I pause the job. It's not so many that adding a new server runs for 20 minutes and everything else pauses.

  3. The sources are processd in the order they are listed.

  4. I make some decisions around setting the severity level. Failed jobs and job steps are errors. SQL Server errors are errors. I haven't gone much beyond that yet.

  5. I haven't done much in the way of optimizations yet. It will process between 2,000 and 3,000 events per second on my aging desktop with SQL Server running on the same box. A properly scaled Logstash doesn't slow it down much. I have a few servers that keep 1 GB of login events in 50 MB files. It takes roughly 20 minutes to get through it the first time.

  6. If it gets behind and the offset becomes invalid, it will log an error. It will also set a flag to try and catch up next time. That flag is a third column in the status file that says "reset". If it finds that, it will start at the beginning of the extended event file target and read until it gets to an event after the last one it saw. It will also log an error that events were probably skipped.