textract-cli

CLI for running files through AWS Textract

APACHE-2.0 License

Downloads
33
Stars
52
Committers
2

textract-cli

CLI for running files through AWS Textract

Installation

Install this tool using pip:

pip install textract-cli

Configuration

Any of the methods for configuring boto3 will work with this tool. Environment variables or a ~/.aws/config file are good options here.

Usage

To run Textract OCR against a JPEG or PNG file (must be smaller than 5MB):

textract-cli image.jpeg

This will output to standard out. To save to a file use this:

textract-cli image.jpeg > output.txt

Or use the -o/--output option like this:

textract-cli image.jpeg -o output.txt

For help, run:

textract-cli --help

You can also use:

python -m textract_cli --help

Alternatives

amazon-textract-textractor an Amazon project offering a similar but much more comprehensive CLI.

Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

cd textract-cli
python -m venv venv
source venv/bin/activate

Now install the dependencies and test dependencies:

pip install -e '.[test]'

To run the tests:

pytest