Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.
MIT License
Parse vision is an open source tool to visualise what OCR is parsing in a PDF document to help de...
OCR powered screen-capture tool to capture information instead of images
🔎📝 This is a module to make specifics OCRs at food products and nutritional tables.
Improved file parsing for LLM’s
OCR, layout analysis, reading order, line detection in 90+ languages
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) ...
Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs
Easy to use text extractor, from PDF, DOC, DOCX and other documents, including if necessary using...
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Ch...
pix2tex: Using a ViT to convert images of equations into LaTeX code.
A Python wrapper for Google Tesseract
Text preprocessing, representation and visualization from zero to hero.
Convert pdf to pages of images
An easy way to extract information from documents