Import and export tools for elasticsearch & opensearch
APACHE-2.0 License
Published by evantahler over 7 years ago
Adds the ability to use a Javascript module for the transform. When specifying the transform
option, prefix the value with @
(a curl convention) to load the top-level function which is called with the document and the parsed arguments to the module.
Uses a pseudo-URL format to specify arguments to the module as follows:
elasticdump --transform='@./transforms/my-transform?param1=value¶m2=another-value'
With a module at ./transforms/my-transform.js
with the following:
module.exports = function (doc, options) {
// do something to doc
};
will load module ./transforms/my-transform.js', and execute the function with
docand
options=
{"param1": "value", "param2": "another-value"}`.
Also supplied is an example transform for anonymizing data on-the-fly. It works well for our needs, and may suit yours too.
Regular scripts passed as strings to elasticdump
are still parsed and used the same as before. This changes nothing about existing behaviour, only adds to it. Notably, this bypasses the security of the sandboxed vm environment used by the string-based transform, giving you more flexibility.
Published by evantahler over 7 years ago
--noRefresh
optionSigns root "/" request for version so that version can be determined correctly when working with older (< version 5) ES clusters
Uses the AWS SDK CredentialProviderChain
to allow users to specify credentials in a number of standard ways, most importantly adding support to automatically get credentials from the EC2 metadata service or task role credentials in ECS. These really can't be supplied by the command line or as a file as they are automatically rotated every hour. I believe I didn't break any existing command line options, but switched to using aws-sdk
instead of awscred
npm module. See:
https://aws.amazon.com/blogs/security/a-new-and-standardized-way-to-manage-credentials-in-the-aws-sdks/
or for Javascript specifically
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/CredentialProviderChain.html#defaultProviders-property
Small doc enhancements
Adds a docker-compose
file as an aid to anyone trying to run tests locally as the tests expect an elasticsearch container to be running at localhost:9200. Hopefully this isn't confusing to someone thinking this is a way to run elasticdump in Docker
Published by evantahler over 7 years ago
httpAuthFile
which takes a path to a file which contains:user=<username>
password=<password>
Published by evantahler almost 8 years ago
Suppress version detection message by moving them into debug.
This bug was introduced in v3..0.0
Published by evantahler almost 8 years ago
A new option, --transform
--transform
A javascript, which will be called to modify documents
before writing it to destination. global variable 'doc'
is available.
Example script for computing a new field 'f2' as doubled
value of field 'f1':
doc._source["f2"] = doc._source.f1 * 2;
Published by evantahler almost 8 years ago
We now poll for the ES version when determining the default SearchBody query.
This allows this tool to now work with ElasticSearch v5.x.x
Fix multielasticdump when response is an object.
Published by evantahler about 8 years ago
adds a --quiet
option to suppress all stout (stderr will still be output)
Published by evantahler over 8 years ago
If you are using Amazon Elasticsearch hosted solution, you have probaly protected the service using IAM roles. In that case you need to sign every request to Elasticsearch with Signature Version 4. This update makes use of 'aws4' to create correct signing when using the REST API.
by @thomasheckmann via https://github.com/taskrabbit/elasticsearch-dump/pull/239
Published by evantahler over 8 years ago
--all
option in favor of multielasticdump
. The --all
flag hadn't worked in some time.Published by evantahler over 8 years ago
Allows filtering by index name via --match
when loading from multielasticdump
Published by evantahler over 8 years ago
Fixes offset and reporting strings from multidump
Published by evantahler over 8 years ago
--help
elasticdump --version
direction
option to multielasticdump
.--direction
is dump
, which is the default, --input
MUST be a URL for the base location of an ElasticSearch server (http://localhost:9200) and --output
MUST be a directory. Each index that does match will have a data, mapping, and analyzer file created.--direction
should be set to load',
--inputMUST be a directory of a multielasticsearch dump and
--output` MUST be a Elasticsearch server URL.Published by evantahler over 8 years ago
fixes a bug wherein using the --debug
flag would not show debug output. Fixed by @evantahler via https://github.com/taskrabbit/elasticsearch-dump/pull/208
Published by evantahler over 8 years ago
Finally removes scan, per #202. Also reduce the brittleness of the test suite at the expense of speed
Published by evantahler over 8 years ago
_id
by @evantahler via https://github.com/taskrabbit/elasticsearch-dump/pull/202
Published by evantahler over 8 years ago
All of our old "bulk" mode commands have been removed. They were buggy and not maintained properly. This change, while reducing functionality of this tool, will provide a smaller, more stable tool.
If you need to export multiple indexes, look for the multielasticdump section of the tool.
Published by evantahler over 8 years ago
Update JSONStream
to the latest version
Published by evantahler over 8 years ago
Fix error when mapping name include special character
{
"product_production": {
"mappings": {
"user/admin": {
"properties": {
"..."
},
}
}
}
}
Mapping above will produce following error during restore
Error: failed to parse json (message: "Unexpected token N") - source: "No handler found for uri [/product_production/admin/user/_mapping] and method [PUT]"
Expected uri should /product_production/admin%2Fuser/_mapping
for previous mapping to work correctly.
by @kahirul via https://github.com/taskrabbit/elasticsearch-dump/pull/190