A set of handy scripts to make the tesseract training process a bit easier.
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Convert pdf to pages of images
Tutorial on running keras model in C++ and python tensorflow
Single Shot MultiBox Detector implemented with TensorFlow
A Python wrapper for the tesseract-ocr API
A Python wrapper for Google Tesseract
OCR, layout analysis, reading order, line detection in 90+ languages
Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.
Training data generator for text detection
A synthetic data generator for text recognition
A set of data files that can be used to train tesseract-ocr to read Georgian script (ქართული ენა)
My modifications to Tesseract boxfile editing tool.
pip installable versions of tesseract-ocr data
🔎📝 This is a module to make specifics OCRs at food products and nutritional tables.
Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs