A set of data files that can be used to train tesseract-ocr to read Georgian script (ქართული ენა)
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
My modifications to Tesseract boxfile editing tool.
OTFeatureFreezer GUI app and pyftfeatfreeze commandline tool in Python to permanently "apply" Ope...
ktrain is a Python library that makes deep learning and AI more accessible and easier to apply
This repository contains scripts for Cross-lingual Annotation Projection of Sequence Labelling da...
Translation-over-Diacritization technique implementation
Data repository for pretrained NLP models and NLP corpora.
CRF to detect named entities (primarily names of people)
Pipeline for Analyzing Text Data: Acquire, Preprocess, Analyze
NeatText a simple NLP package for cleaning textual data and text preprocessing
A collection of font engineering utilities
A synthetic data generator for text recognition
Python implementations of selected Princeton Java Algorithms and Clients by Robert Sedgewick and ...
Training data generator for text detection
Train and evaluate neural network language models for POS tagging, tag input sentences according ...