English word segmentation, written in pure-Python, and based on a trillion-word corpus.
OTHER License
Simple Solution for Multi-Criteria Chinese Word Segmentation
PyThaiNLP For spaCy
Python large file utilities inspired by GNU Coreutils and functional programming.
A Reddit bot that summarizes news articles written in Spanish or English. It uses a custom built ...
Most common sentences and words for all languages in the OpenSubtitles2018 corpus with Python code
Commonly Consumed Code Commodities
SciKit Sequitur is an Apache2 licensed Python module for inferring compositional hierarchies from...
Data repository for pretrained NLP models and NLP corpora.
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extrac...
A comprehensive Data and Text Mining workflow for submissions and comments from any given public ...
Package for Statistically significant linguistic change
A little word cloud generator in Python
Chinese word segmentation based on statistical methods (for Python)
Word/n-gram frequency lists for the Google Books Ngram Corpus (v3, all languages) with Python code
Probabilistically split concatenated words using NLP based on English Wikipedia unigram frequencies.