analiticcl

an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction

GPL-3.0 License

Downloads
17.6K
Stars
31
Committers
3

Bot releases are hidden (Show)

analiticcl - v0.4.7

Published by proycon 4 days ago

  • Updated various dependencies, python binding now also runs on Python 3.13
analiticcl - v0.4.6 Latest Release

Published by proycon 6 months ago

Minor update, dependency upgrades and added VariantModel.set_confusables_before_pruning() method to the Python binding that sets the --early-confusables parameter (#19)

analiticcl - v0.4.5

Published by proycon over 1 year ago

Bugfix release:

  • Fixed bug in handling (hashing/normalizing) multibyte characters.
  • Added a testinput mode to test input against the alphabet.
  • Debug level two or higher now outputs the entire alphabet.
analiticcl - v0.4.4

Published by proycon about 2 years ago

Bugfix release:

  • fix in reading context rules
analiticcl - v0.4.3

Published by proycon about 2 years ago

Bugfix release:

  • fix in reading context rules
  • improving error feedback when parsing context rules
  • added missing --contextrules parameter
analiticcl - v0.4.2

Published by proycon about 2 years ago

  • A single context rule may now output multiple tags (and corresponding sequence numbers) (knaw-huc/golden-agents-htr#22)
  • updated dependencies (e.g rustfst 0.11.5)
analiticcl - v0.4.1

Published by proycon over 2 years ago

Bugfixes:

  • Fixed non-deterministic behaviour (in ties where scores were equal and in the ordering of anagram instances)
analiticcl - v0.4.0

Published by proycon over 2 years ago

New:

  • Context rules and tagging (https://github.com/knaw-huc/golden-agents-htr#7): allows specifying regular-expression like patterns to match entities spanning multiple 'words'
  • Allow choosing unicode codepoints for offsets instead of UTF-8 byte offsets (#15)

Bugfixes:

analiticcl - v0.3.3

Published by proycon over 2 years ago

Important bugfix release:

  • Fixed use of frequency information in score() function
  • Fixed parsing of DistanceThreshold (edit distance threshold, anagram distance threshold), when it consist of a relative and absolute component.
  • Better parameter validation in Python binding
  • More verbose feedback on chosen parameters
  • Fixed version information
analiticcl - v0.3.2

Published by proycon over 2 years ago

  • fixed auto-detection of frequency information in parsing variant lists
  • fix for the python wheel building
analiticcl - v0.3.1

Published by proycon about 3 years ago

Minor bugfix release: fixes an issue with invalid JSON serialisation #13

analiticcl - v0.3.0

Published by proycon about 3 years ago

Major development updates:

  • Initial implementation on finding matches in running text (error detection); search mode #2
    • Support for Language Models to consider context
    • Support for n-grams; decoding using Finite State Transducers
    • Strict separation between lexicon and language model
    • Still experimental
  • Removed frequency from score component and added it as a separate score
    • Added frequency-ranking as an opt-in feature now; explicitly propagate frequency score and distance score separately to the output
  • Removed lexicon weights
  • Made distance score computations relative to input length
  • Changed default weights so levenshtein-damarau carries most weight
  • Implemented a Python binding (#1)
  • Fixed insertions after deletion (#6), removed premature bound-check optimisations
  • Implemented a learning mode that collects variants for a given lexicon, either in running text or matched against another test lexicon
  • Implemented a cut-off threshold
  • Allow frequency information in variant lists
  • Adhere strict to lexiconc/variantlist loading order as specified on command line
  • Return all matching lexicons for matching rather than just one (in case an entry exists in multiple lexicons)
  • More debug levels
  • Anagram/edit distance can now be set to an absolute value or a ratio (relative to input length)
  • Significant documentation updates
analiticcl - v0.2.0

Published by proycon over 3 years ago

  • This release replaces the underlying big integer library with ibig 0.3.2, which leads to a significant performance increase due to less heap allocations.
  • Implemented explicit variant ingestion and matching (but still requires proper testing)
  • fixed benchmarks
  • allow some escape sequences in alphabet files
analiticcl - v0.1.1

Published by proycon over 3 years ago

Bugfix release

analiticcl - v0.1.0

Published by proycon over 3 years ago

Initial experimental release of analiticcl

Package Rankings
Top 12.87% on Pypi.org
Top 28.04% on Crates.io
Badges
Extracted from project README
Crate Docs GitHub build GitHub release Project Status: Active – The project has reached a stable, usable state and is being actively developed.