A lightning fast Finite State machine and REgular expression manipulation library.
MIT License
Bot releases are visible (Hide)
Published by SergeiAlonichau over 3 years ago
Published by SergeiAlonichau over 3 years ago
Published by SergeiAlonichau over 4 years ago
Four tokenization algorithms supported: patterns, word-piece, unigram lm, bpe. Added space normalization api, Added a few more popular models, added unigram lm tokenization models trained on uniformly represented ~84 languages from wikimatrix set. Bug fixes, parity fixes.