This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).
Bot releases are hidden (Show)
Built against ucto 0.30
Published by proycon about 1 year ago
Published by proycon over 1 year ago
Updated for Ucto v0.29, fixes https://github.com/proycon/python-ucto/issues/16
Published by proycon over 1 year ago
Updated for ucto v0.28
Published by proycon over 1 year ago
Published by proycon almost 2 years ago
Updated for ucto 0.26
Published by proycon almost 2 years ago
Fixes for Python package index and python wheels, no functional changes
Published by proycon about 2 years ago
installdata()
function to automatically download and install uctodata locallyPublished by proycon almost 3 years ago
Compatibility with ucto v0.24.1 (breaks compatibility with earlier versions)
Published by proycon about 4 years ago
Published by proycon about 5 years ago
Fix release for compatibility with moved libxml2 on (some?) macs. (proycon/LaMachine#154)
Published by proycon over 5 years ago
Published by proycon over 6 years ago
Published by proycon over 6 years ago
workaround for libicu 61 compatibility
Published by proycon over 6 years ago
Published by proycon over 6 years ago
Published by proycon about 8 years ago
Published by proycon over 8 years ago
Release FoLiA v1.0, compiles against new libfolia and ucto
Published by proycon over 8 years ago
Stable release