Overview

This is a place to prototype interesting examples of using Painless to achieve ad hoc data analysis with Elasticsearch. The idea is to have somewhere we can collaborate on developing examples which showcase what you can do with Painless or are preproduct features which we can explore as scripts. An example must include a working Painless snippet and a Python test harness. The test harness must be able to create an index against which one can exercise the functionality and allow one to run it via the Python client. Ideally, any important implementation details should be discussed in the README of each example. It is fine to include multiple implementations of the same task to showcase different features of Painless. As a minimum discussions should include dangers and mitigations, such as using and how to avoid using too much memory in a scripted metric aggregation.

Motivation

This grew out of a request to implement the apriori algorithm within the Elastic stack. It turns out that a scripted metric aggregation is able to do this, which is great. However, it is not straightforward to work out how to do this if 1. your primary programming language is not Java, 2. you use only the existing documentation. These examples are intended to provide a reference place where data scientist users of Elasticsearch can see pedagogical examples of using scripting to perform ad hoc data analysis tasks with Elasticsearch. Aside from providing useful out-of-the-box functionality, the hope is to showcase how much one can achieve and help introduce this community to this useful functionality.

Usage

Set up a virtual environment called env

python3 -m venv env

Activate it

source env/bin/activate

Install the required dependencies

pip3 install -r requirements.txt

Once you start an Elasticsearch instance, then each example includes code to generate some sample data. This is typically done using the Demo object from the demo module, for example:

>>> from examples.apriori.demo import Demo
>>> demo = Demo(user_name='my_user', password='my_password')
>>> demo.setup()

where 'my_user' and 'my_password' are the user name and password for the Elasticsearch instance you've started. The Demo object also allows you to run the aggregation using the Elasticsearch Python to see the result on the demo data set, for example:

>>> demo.run()

For the apriori example you should see output like:

FREQUENT ITEM SETS DEMO...
FREQUENT_SETS(size=1)
   DIAMETER_PEER_GROUP_DOWN / support = 0.163
   DIAMETER_PEER_GROUP_DOWN_RX / support = 0.1385
   NO_PEER_GROUP_MEMBER_AVAILABLE / support = 0.309
   DIAMETER_PEER_GROUP_UP_TX / support = 0.1535
   PAD-Failure / support = 0.175
   NO_PROCESS_STATE / support = 0.1385
   NO_RESPONSE / support = 0.3305
   DIAMETER_PEER_GROUP_UP_RX / support = 0.145
   IP_REACHABLE / support = 0.5105
   RELAY_LINK_STATUS / support = 0.3675
   POM-Failure / support = 0.1765
   MISMATCH_REQUEST_RESPONSE / support = 0.1815
   vPAS-Failure / support = 0.1755
   PROCESS_STATE / support = 0.291
   IP_NOT_REACHABLE / support = 0.351
   DIAMETER_PEER_GROUP_DOWN_GX / support = 0.1405
FREQUENT_SETS(size=2)
   PAD-Failure PROCESS_STATE / support = 0.1445
   DIAMETER_PEER_GROUP_UP_TX POM-Failure / support = 0.1475
   MISMATCH_REQUEST_RESPONSE PAD-Failure / support = 0.1525
   ...

Each example directory also includes the scripted metric request in a text file, for example examples/apriori/scripted_metric_frequent_sets.txt. This can be pasted also be pasted and run kibana dev console as follows:

GET apriori_demo/_search
{
  "size": 0,
  "query": {
    "function_score": {
      "random_score": {}
    }
  },
  ...
}

Related Projects

elasticsearch-faker

elasticsearch-faker is a CLI tool to generate fake data for Elasticsearch.

15 Dec 2020 10

es-kit

A "pick and mix" library that simplifies writing Elasticsearch code

28 Nov 2021 2

apm-agent-python

The official Python module for Elastic APM

13 Jul 2017 411

simple-data-generator

22 Sep 2023 5

elastipy

python elasticsearch query module for easily accessing nested aggregations and such

23 Dec 2020 3

elasticsearch-opensearch-benchmark

Benchmarking Elasticsearch vs. Opensearch

23 Jul 2023 18

makelogs

Simple log generator for testing kibana

17 Jun 2014 109

elastic

Deprecated: Use the official Elasticsearch client for Go at https://github.com/elastic/go-elastic...

06 Dec 2012 7,399

app-search-javascript

Elastic App Search Official JavaScript Client

09 Aug 2019 66

elasticpwn

Quickly collect data from thousands of exposed Elasticsearch or Kibana instances and generate a r...

28 Dec 2021 28

examples

Home for Elasticsearch examples available to everyone. It's a great way to get started.

09 Sep 2014 2,639

securitylabs-thrunting-tools

A collection of utilities to help with analysis on the command line.

16 Nov 2022 13

elasticsearch

Free and Open, Distributed, RESTful Search Engine

08 Feb 2010 68,875

elasticsearch-dsl-py

High level Python client for Elasticsearch

05 Mar 2014 3,769

elasticsearch-labs

Notebooks & Example Apps for Search & AI Applications with Elasticsearch

14 Jun 2023 622

painless-data-science-examples