elasticsearch-dump

Import and export tools for elasticsearch & opensearch

APACHE-2.0 License

Downloads
77.5K
Stars
7.4K
Committers
124
elasticsearch-dump - v3.3.1: Transform modules

Published by evantahler over 7 years ago

Adds the ability to use a Javascript module for the transform. When specifying the transform option, prefix the value with @ (a curl convention) to load the top-level function which is called with the document and the parsed arguments to the module.

Uses a pseudo-URL format to specify arguments to the module as follows:

elasticdump --transform='@./transforms/my-transform?param1=value&param2=another-value'

With a module at ./transforms/my-transform.js with the following:

module.exports = function (doc, options) {
    // do something to doc
};

will load module ./transforms/my-transform.js', and execute the function with docandoptions={"param1": "value", "param2": "another-value"}`.

Also supplied is an example transform for anonymizing data on-the-fly. It works well for our needs, and may suit yours too.

Regular scripts passed as strings to elasticdump are still parsed and used the same as before. This changes nothing about existing behaviour, only adds to it. Notably, this bypasses the security of the sandboxed vm environment used by the string-based transform, giving you more flexibility.

elasticsearch-dump - v3.3.0 Transform and AWS updates

Published by evantahler over 7 years ago

--noRefresh option

run --transform in global context

multiple transform options

AWS signing enhancements

  1. Signs root "/" request for version so that version can be determined correctly when working with older (< version 5) ES clusters

  2. Uses the AWS SDK CredentialProviderChain to allow users to specify credentials in a number of standard ways, most importantly adding support to automatically get credentials from the EC2 metadata service or task role credentials in ECS. These really can't be supplied by the command line or as a file as they are automatically rotated every hour. I believe I didn't break any existing command line options, but switched to using aws-sdk instead of awscred npm module. See:
    https://aws.amazon.com/blogs/security/a-new-and-standardized-way-to-manage-credentials-in-the-aws-sdks/
    or for Javascript specifically
    http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/CredentialProviderChain.html#defaultProviders-property

  3. Small doc enhancements

  4. Adds a docker-compose file as an aid to anyone trying to run tests locally as the tests expect an elasticsearch container to be running at localhost:9200. Hopefully this isn't confusing to someone thinking this is a way to run elasticdump in Docker

elasticsearch-dump -

Published by evantahler over 7 years ago

elasticsearch-dump - Custom HTTP Agent and Basic Auth via File

Published by evantahler over 7 years ago

Added User-Agent header to elasticsearch HTTP(S) transport

  • agent is "elasticdump"
  • analysis of ES request logs
  • if a reverse proxy is implemented in front of the ES cluster, allow more intelligent behaviour (e.g. by default redirect to Kibana, but if U-A matches elasticdump then allow access to the REST endpoint)
  • more compliant with RFC7231 (which states that we should send a User-Agent)
  • by @g-a-d via #284

Add HTTP auth by ini file

  • new option httpAuthFile which takes a path to a file which contains:
user=<username>
password=<password>
  • by @tarrow via 276
elasticsearch-dump - v3.0.2: Fix version logging when printing results to STDOUT

Published by evantahler almost 8 years ago

Suppress version detection message by moving them into debug.
This bug was introduced in v3..0.0

elasticsearch-dump - v3.0.1: New option to transform documents by script

Published by evantahler almost 8 years ago

A new option, --transform

--transform
                    A javascript, which will be called to modify documents
                    before writing it to destination. global variable 'doc'
                    is available.
                    Example script for computing a new field 'f2' as doubled
                    value of field 'f1':
                        doc._source["f2"] = doc._source.f1 * 2;
elasticsearch-dump - v3.0.0: Version Detection for default body query

Published by evantahler almost 8 years ago

We now poll for the ES version when determining the default SearchBody query.
This allows this tool to now work with ElasticSearch v5.x.x

elasticsearch-dump -

Published by evantahler about 8 years ago

Fix multielasticdump when response is an object.

  • Solves an issue when exporting index metadata (mapping) and data was returned as an object (rather than array). This was most prevalent with Elasticsearch versions before 2.0.0, but was still seen in some 2.0 deployments.
  • by @Hugodby via https://github.com/taskrabbit/elasticsearch-dump/pull/249
elasticsearch-dump - v2.4.1 --quiet

Published by evantahler about 8 years ago

adds a --quiet option to suppress all stout (stderr will still be output)

elasticsearch-dump - v2.4.0: Added support for Amazon Request Signing 4

Published by evantahler over 8 years ago

If you are using Amazon Elasticsearch hosted solution, you have probaly protected the service using IAM roles. In that case you need to sign every request to Elasticsearch with Signature Version 4. This update makes use of 'aws4' to create correct signing when using the REST API.

by @thomasheckmann via https://github.com/taskrabbit/elasticsearch-dump/pull/239

elasticsearch-dump - v2.3.0: Merge pull request #233 from taskrabbit/no_all

Published by evantahler over 8 years ago

  • remove the --all option in favor of multielasticdump. The --all flag hadn't worked in some time.
elasticsearch-dump - v2.2.2

Published by evantahler over 8 years ago

Allows filtering by index name via --match when loading from multielasticdump

elasticsearch-dump - v2.2.1

Published by evantahler over 8 years ago

Fixes offset and reporting strings from multidump

elasticsearch-dump - v2.2.0

Published by evantahler over 8 years ago

Version

Bulk

Remove Skip; fix offset

MultiDump modes (+ direction)

  • Adds the direction option to multielasticdump.
  • If the --direction is dump, which is the default, --input MUST be a URL for the base location of an ElasticSearch server (http://localhost:9200) and --output MUST be a directory. Each index that does match will have a data, mapping, and analyzer file created.
  • For loading files that you have dumped from multielasticsearch, --direction should be set to load',--inputMUST be a directory of a multielasticsearch dump and--output` MUST be a Elasticsearch server URL.
  • by @cggaurav via https://github.com/taskrabbit/elasticsearch-dump/pull/216
elasticsearch-dump - fix debug output

Published by evantahler over 8 years ago

fixes a bug wherein using the --debug flag would not show debug output. Fixed by @evantahler via https://github.com/taskrabbit/elasticsearch-dump/pull/208

elasticsearch-dump - remove scan

Published by evantahler over 8 years ago

Finally removes scan, per #202. Also reduce the brittleness of the test suite at the expense of speed

elasticsearch-dump - v2.1.0 Elasticsearch v2.x compatibility

Published by evantahler over 8 years ago

  • move from scan/scroll when reading ES to using scan with a doc sort of _id
    • while this is the recommended procedure for ES v2.x onward, this is technically slower for ES v1.x, but is backwards compatible
    • we now handle when the getting of a scrollId returns hits (ES v2.x)
  • handle new parent/child mapping data types in ES v2.x
    • support parent/child link keys in both of the ways that ES v1.x and v2.x report parent child mappings.
    • update the crazy parent/child test to one that makes a lot more sense
  • Run test suite (in travis.ci) against multiple versions of Elasticsearch

by @evantahler via https://github.com/taskrabbit/elasticsearch-dump/pull/202

elasticsearch-dump - v2.0.0: Remove Bulk Mode

Published by evantahler over 8 years ago

All of our old "bulk" mode commands have been removed. They were buggy and not maintained properly. This change, while reducing functionality of this tool, will provide a smaller, more stable tool.

If you need to export multiple indexes, look for the multielasticdump section of the tool.

https://github.com/taskrabbit/elasticsearch-dump/pull/191

elasticsearch-dump - v2.0.1

Published by evantahler over 8 years ago

Update JSONStream to the latest version

https://github.com/taskrabbit/elasticsearch-dump/pull/195

elasticsearch-dump - v1.1.4 Escape index mapping path

Published by evantahler over 8 years ago

Fix error when mapping name include special character

{
  "product_production": {
    "mappings": {
      "user/admin": {
        "properties": {
          "..." 
        },
      }
    }
 }
}

Mapping above will produce following error during restore
Error: failed to parse json (message: "Unexpected token N") - source: "No handler found for uri [/product_production/admin/user/_mapping] and method [PUT]"

Expected uri should /product_production/admin%2Fuser/_mapping for previous mapping to work correctly.


by @kahirul via https://github.com/taskrabbit/elasticsearch-dump/pull/190