Fast n-Gram Tokenization
OTHER License
Tools for cleaning and normalizing text data
Detect text reuse and document similarity