Tools for running OCR against files stored in S3
APACHE-2.0 License
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc...
Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs
Freeing data processing from scripting madness by providing a set of platform-agnostic customizab...
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
OCR-D wrapper for detectron2 based segmentation models
img2table is a table identification and extraction Python Library for PDF and images, based on Op...
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compa...
Scrape files for sensitive information, and generate an interactive HTML report. Based on Rabin2.
Utility scripts / apps
Open Access PDF harvester
OCR, layout analysis, reading order, line detection in 90+ languages
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) ...
Training data generator for text detection
A security toolkit for Amazon S3
🔎📝 This is a module to make specifics OCRs at food products and nutritional tables.