Imports raw JSON to Elasticsearch in a multi-thread way
GPL-3.0 License
Imports raw JSON to Elasticsearch in a multi-thread way
We have 5 state here
Install the elasticsearch package with pip :
pip install elasticsearch
Read more about versions here
--data : The data file
--check : Validate data file
--bulk : ElasticSearch endpoint ( http://localhost:9200 )
--index : Index name
--type : Index type
--import : Import data to ES
--thread : Threads amount, default = 1
--help : Display help message
I suggest you check your data before ( or during ) import process
python import.py --data test_data.json --check
python import.py --data test_data.json --import --bulk http://localhost:9200 --index index_name --type type_name
python import.py --data test_data.json --import --bulk http://localhost:9200 --index index_name --type type_name --check
python import.py --data test_data.json --import --bulk http://localhost:9200 --index index_name --type type_name --thread 16
python import.py --data test_data.json --import --bulk http://localhost:9200 --index index_name --type type_name --check --thread 16
We have much faster process using multi-thread way. It depends on your computer/server resources. This script used linecache
to put data in RAM, so you need enough memory capacity too
The whole process took about ~30 minutes and the usage of resources were efficient
git checkout -b my-new-feature
git commit -am 'Add some feature'
git push origin my-new-feature
Each project may have many problems. Contributing to the better development of this project by reporting them