colibri-core

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

GPL-3.0 License

Downloads
629
Stars
123
Committers
5

Bot releases are hidden (Show)

colibri-core - v2.5.9 Latest Release

Published by proycon over 1 year ago

[Ko van der Sloot]

  • Major code cleanup: range-based for loops, override statement, convert most stream pointers to references, more const parameters

[Maarten van Gompel]

  • cleanup: remove last traces of python2 support, refactored exceptions
  • Set up wheel building

This release does not provide a shared library; use static linking instead.

colibri-core - v2.5.8

Published by proycon over 1 year ago

  • Python setup.py no longer attempts to build colibri core C++ library, must be done in a manual step before.
  • Set up continuous integration and wheel building
  • Python test fix
colibri-core - v2.5.7

Published by proycon over 1 year ago

[Maarten van Gompel]

  • Fixed long option parsing
  • Fixed column length mismatch in TSV header/data output
  • Fixed a build problem
  • Updated installation instructions

[Ko van der Sloot]

  • Significant cleanup of the code
  • Updated for newer autoconfig versions
colibri-core - v2.5.6

Published by proycon about 2 years ago

[Maarten van Gompel]

  • codemeta.json: updating according to (proposed) CLARIAH requirements (CLARIAH/clariah-plus#38)
  • Dockerfile: added

[Ko van der Sloot]

  • Code cleanup
    • added some exceptions for unwanted cases detected by scan-build
    • out-dommented DOFLEXFROMCOOC and cached_DOFLEXFROMCOOC variables, they seem useless
    • removed unused assignments
colibri-core - v2.5.5

Published by proycon over 4 years ago

Thanks to @kosloot, various warnings on clang were fixed in this minor release.

colibri-core - v2.5.4

Published by proycon over 4 years ago

Implemented the ability to prune subsumed n-grams (retaining only the longer non-subsumed versions). Introduces a new PRUNESUBSUMED variable for PatternModelOptions.
Note: This is an aggressive form of pruning that should also work for unordered models, matching is based on types rather than individual tokens (all subsumed types are pruned).

colibri-core - v2.5.3

Published by proycon over 4 years ago

Bugfix release: Certain options from PatternModelOptions were not available to the python binding yet.

colibri-core - v2.5.2

Published by proycon over 4 years ago

Bugfix release: Pattern size and category constraints were not working for several methods (getcooc/getleftcooc/getrightcooc/getleftneighbours/getrightneighbours) #44

colibri-core - v2.5.1

Published by proycon about 5 years ago

Very minor update release:

  • Updated codemeta metadata
  • Added ClassEncoder.find()
colibri-core - v2.5.0

Published by proycon almost 6 years ago

Better handling of large patterns, PatternPointer size descriptor is now 64 bits (fixes #42) at cost of a small increase in memory consumption in various computations.

(The experimental and relatively unused PatternPointerModels are not backwards compatible, contact me if this is a problem)

colibri-core - v2.4.10

Published by proycon almost 6 years ago

Important bugfix release:

  • Fixes data-clipping bug on loading large corpora in memory (used by indexed patternmodels) #41

(All users are urged to upgrade!)

colibri-core - v2.4.9

Published by proycon over 6 years ago

  • Added metadata
  • macOS fix
colibri-core - v2.4.8

Published by proycon over 6 years ago

  • Minor update: made setup.py more robust for manual installation mode (without compiling C++ lib) (v2.4.7 was skipped)
colibri-core - v2.4.6

Published by proycon about 7 years ago

  • fix: colibri-classencode -t (threshold) behaviour was wrong (was interpreted as +1)
colibri-core - v2.4.5

Published by proycon over 7 years ago

  • Refactored alignment model
  • added BasicPatternAlignmentModel
  • Major cleanup of warnings and possible issues (thanks to @kosloot)
colibri-core - v2.4.4

Published by proycon almost 8 years ago

  • Bugfix: fixes covered token count per category/n (issue #26)
  • New feature: colibri-patternmodeller has a--simplereport (-r) option that generates a report without coverage information (more limited but a lot faster)
colibri-core - v2.4.3

Published by proycon about 8 years ago

v2.4.2 was prematurely released, one minor test was corrupt. Fixed now in this release.

colibri-core - v2.4.2

Published by proycon about 8 years ago

Bugfix release, fixes issue #25

colibri-core - v2.4.1

Published by proycon over 8 years ago

Minor fix release prior to paper publication:

  • Python 2.7 compatibility fix
  • Updated python tutorial
  • Added benchmarks
colibri-core - v2.4.0

Published by proycon over 8 years ago

Various fixes:

  • Speed up in ngrams() computation (issue #21)
  • Performance fix for processing long lines
  • Pattern.instanceof()should be faster and is now available from Python too
  • Attempt to fix compilation issue on certain platforms (issue #22), unconfirmed

New features:

  • Implemented new filtering mechanism that supports actively checking whether patterns are instances of a limited set of specified skipgrams, or a superset of specified ngrams.
  • Implemented ignorenewlines option in class encoding. Useful if you have source text split by for instance sentences (one per line), but want a model that crosses sentence boundaries.
  • Implemented vocabulary import for the class encoding stage (issue #2)