python-ucto

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).

Downloads
1.3K
Stars
29
Committers
1

Bot releases are visible (Hide)

python-ucto - v0.6.7 Latest Release

Published by proycon 12 months ago

Built against ucto 0.30

python-ucto - v0.6.6

Published by proycon about 1 year ago

  • Fixed wheels for macos arm64
  • Fixed compilation with latest cython
  • No functional changes
python-ucto - v0.6.5

Published by proycon over 1 year ago

Updated for Ucto v0.29, fixes https://github.com/proycon/python-ucto/issues/16

python-ucto - v0.6.4

Published by proycon over 1 year ago

Updated for ucto v0.28

python-ucto - v0.6.3

Published by proycon over 1 year ago

  • No functional changes
  • Require latest ucto
  • Removed libtar dependency
  • Fixed wheel building, builds wheels that are further backward-compatible now (manylinux2014)
python-ucto - v0.6.2

Published by proycon almost 2 years ago

Updated for ucto 0.26

python-ucto - v0.6.1

Published by proycon almost 2 years ago

Fixes for Python package index and python wheels, no functional changes

python-ucto - v0.6.0

Published by proycon about 2 years ago

  • Adds support for locally installed uctodata (in $XDG_CONFIG_HOME/ucto, aka ~/.confg/ucto)
  • Adds an installdata() function to automatically download and install uctodata locally
  • Allow distribution as python wheels
python-ucto - v0.5.3

Published by proycon almost 3 years ago

Compatibility with ucto v0.24.1 (breaks compatibility with earlier versions)

python-ucto - v0.5.2

Published by proycon about 4 years ago

  • Fixed lowercasing/uppercasing #8
  • Removed Python 2.7 support
  • Added a notice that sentencedetection is a deprecated parameter, rather than silently ignoring it
python-ucto - v0.5.1

Published by proycon about 5 years ago

Fix release for compatibility with moved libxml2 on (some?) macs. (proycon/LaMachine#154)

python-ucto - v0.5.0

Published by proycon over 5 years ago

  • Compatibility release for ucto v0.15
python-ucto - v0.4.7

Published by proycon over 6 years ago

  • Fixed issue #4 and compatibility release for ucto v0.13
python-ucto - v0.4.5

Published by proycon over 6 years ago

workaround for libicu 61 compatibility

python-ucto - v0.4.4

Published by proycon over 6 years ago

  • More fixes for Mac OS X compilation (with homebrew)
python-ucto - v0.4.3

Published by proycon over 6 years ago

  • Fix for Mac OS X compilation
python-ucto - v0.4.0

Published by proycon about 8 years ago

  • No need for absolute path to configuration file anymore
  • Fix for Python 2.7 compatibility
  • Configurable include/library paths on setup
  • Better documentation
python-ucto - v0.3.0

Published by proycon over 8 years ago

Release FoLiA v1.0, compiles against new libfolia and ucto

python-ucto - v0.2.4

Published by proycon over 8 years ago

Stable release