CLI for running files through AWS Textract
APACHE-2.0 License
CLI for running files through AWS Textract
Install this tool using pip
:
pip install textract-cli
Any of the methods for configuring boto3
will work with this tool. Environment variables or a ~/.aws/config
file are good options here.
To run Textract OCR against a JPEG or PNG file (must be smaller than 5MB):
textract-cli image.jpeg
This will output to standard out. To save to a file use this:
textract-cli image.jpeg > output.txt
Or use the -o/--output
option like this:
textract-cli image.jpeg -o output.txt
For help, run:
textract-cli --help
You can also use:
python -m textract_cli --help
amazon-textract-textractor an Amazon project offering a similar but much more comprehensive CLI.
To contribute to this tool, first checkout the code. Then create a new virtual environment:
cd textract-cli
python -m venv venv
source venv/bin/activate
Now install the dependencies and test dependencies:
pip install -e '.[test]'
To run the tests:
pytest