Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
GPL-3.0 License
Bot releases are visible (Hide)
[Ko van der Sloot]
[Maarten van Gompel]
This release does not provide a shared library; use static linking instead.
Published by proycon over 1 year ago
Published by proycon over 1 year ago
[Maarten van Gompel]
[Ko van der Sloot]
Published by proycon about 2 years ago
[Maarten van Gompel]
[Ko van der Sloot]
Published by proycon over 4 years ago
Thanks to @kosloot, various warnings on clang were fixed in this minor release.
Published by proycon over 4 years ago
Implemented the ability to prune subsumed n-grams (retaining only the longer non-subsumed versions). Introduces a new PRUNESUBSUMED
variable for PatternModelOptions.
Note: This is an aggressive form of pruning that should also work for unordered models, matching is based on types rather than individual tokens (all subsumed types are pruned).
Published by proycon over 4 years ago
Bugfix release: Certain options from PatternModelOptions were not available to the python binding yet.
Published by proycon over 4 years ago
Bugfix release: Pattern size and category constraints were not working for several methods (getcooc/getleftcooc/getrightcooc/getleftneighbours/getrightneighbours) #44
Published by proycon about 5 years ago
Very minor update release:
Published by proycon almost 6 years ago
Better handling of large patterns, PatternPointer size descriptor is now 64 bits (fixes #42) at cost of a small increase in memory consumption in various computations.
(The experimental and relatively unused PatternPointerModels are not backwards compatible, contact me if this is a problem)
Published by proycon almost 6 years ago
Important bugfix release:
(All users are urged to upgrade!)
Published by proycon over 6 years ago
Published by proycon over 6 years ago
setup.py
more robust for manual installation mode (without compiling C++ lib) (v2.4.7 was skipped)Published by proycon about 7 years ago
-t
(threshold) behaviour was wrong (was interpreted as +1)Published by proycon over 7 years ago
Published by proycon almost 8 years ago
--simplereport
(-r
) option that generates a report without coverage information (more limited but a lot faster)Published by proycon about 8 years ago
v2.4.2 was prematurely released, one minor test was corrupt. Fixed now in this release.
Published by proycon about 8 years ago
Bugfix release, fixes issue #25
Published by proycon over 8 years ago
Minor fix release prior to paper publication:
Published by proycon over 8 years ago
Various fixes:
Pattern.instanceof()
should be faster and is now available from Python tooNew features:
ignorenewlines
option in class encoding. Useful if you have source text split by for instance sentences (one per line), but want a model that crosses sentence boundaries.