python-ucto

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).

Downloads
1.3K
Stars
29
Committers
1

Commit Statistics

Past Year

All Time

Total Commits
106
Total Committers
1
Avg. Commits Per Committer
106.0
Bot Commits
0

Issue Statistics

Past Year

All Time

Total Pull Requests
0
0
Merged Pull Requests
0
0
Total Issues
0
16
Time to Close Issues
N/A
10 months