Import and export tools for elasticsearch & opensearch
APACHE-2.0 License
Published by ferronrsmith over 5 years ago
Special thanks to @admlko
Published by ferronrsmith over 5 years ago
NB: Please remember types has been deprecated in Elasticsearch 7
Published by ferronrsmith over 5 years ago
Thanks @ilyaTT
Published by ferronrsmith over 5 years ago
This release contains a breaking change for the s3 transport.
s3Bucket and s3RecordKey params are no longer supported please use s3urls instead
# Import data from S3 into ES (using s3urls)
elasticdump \
--s3AccessKeyId "${access_key_id}" \
--s3SecretAccessKey "${access_key_secret}" \
--input "s3://${bucket_name}/${file_name}.json" \
--output=http://production.es.com:9200/my_index
# Export ES data to S3 (using s3urls)
elasticdump \
--s3AccessKeyId "${access_key_id}" \
--s3SecretAccessKey "${access_key_secret}" \
--input=http://production.es.com:9200/my_index \
--output "s3://${bucket_name}/${file_name}.json"
Thanks @suppenkelch for your contribution
Published by ferronrsmith over 5 years ago
Published by ferronrsmith over 5 years ago
+
from positive numbersPublished by ferronrsmith over 5 years ago
s3Compress
flag that GZIPs stream being sent to s3Published by ferronrsmith over 5 years ago
Published by ferronrsmith over 5 years ago
Published by ferronrsmith over 5 years ago
support-big-int
to multielasticdump
Published by ferronrsmith over 5 years ago
transform
to multielasticdump
Published by ferronrsmith almost 6 years ago
Published by ferronrsmith almost 6 years ago
Published by ferronrsmith almost 6 years ago
Fixed #487 - s3 doesn't write newlines between events
Published by evantahler almost 6 years ago
https://github.com/taskrabbit/elasticsearch-dump/pull/28
Now that we are using the scan/scroll API to load data from Elasticsearch, we need to modify how the flag --limit
is treated in reads.
In most Elasticsearch APIs, limit is literal, in that if you say {size: 100}
, you will get 100 results. However, the scan/scroll API is special, in that it tries to minimize load on each shard and does not pre-collect results before transmitting. The size
in this API is actually results per-shard. So if you have 5 shards and say {size: 100}
, you will actually get ~500 results back (assuming the shard has unsent data to return).
This PR attempts to look up how many shards and index has, and will modify the effective {size}
to be limit / shards
.
Published by ferronrsmith almost 6 years ago
Added s3 transport support.
NB : Only the output (set) has been implemented.
Thanks to @hilt86 for testing and providing an s3 bucket implementation !
Published by ferronrsmith about 6 years ago
The package size has been reduced by over 300%
Published by ferronrsmith about 6 years ago
The work done on the stream splitter should drastically improve the efficiently of dumping to files while helping to mitigate the pesky out of memory exception.
That alone is work a MAJOR bump.
A new --fileSize
flag was added that allows users to specify the file-size of each outputted chunk.
Under the covers elasticsearch-dump uses bytes to convert the abbreviated string represent to bytes that can be used by the new string splitter class.
--fileSize=10mb
// split the file every 10 megabytes
--fileSize=1gb
// split the file every 1 gigabytes.
Remember the higher the fileSize the higher the risk at hitting a out of memory issue. Perform your own testing to see what you system is able to handle.
Thanks for using elasticsearch-dump
Published by ferronrsmith about 6 years ago
retryAttempts
flag to control the amount of times to retry and the retryDelay
to set the back-off
time.parseExtraFields
flag