BlingFire

A lightning fast Finite State machine and REgular expression manipulation library.

MIT License

Downloads
43.4K
Stars
1.8K
Committers
26

Bot releases are hidden (Show)

BlingFire - Bling Fire v0.1.8 Latest Release

Published by SergeiAlonichau about 3 years ago

  1. added IdsToText API for all models which return Ids, for example see https://github.com/microsoft/BlingFire/blob/master/scripts/blingfire_example.py
BlingFire - Bling Fire v0.1.7

Published by SergeiAlonichau over 3 years ago

  1. added no_dummy_prefix configuration and API to change the existing model configuration
  2. fixed the offset of the dummy prefix is now always -1, the first token may have start/end offset -1 it means dummy prefix is included
  3. change compilation options for Windows code
BlingFire - Bling Fire v0.1.5

Published by SergeiAlonichau over 3 years ago

  • Added byte BPE algorithm support
  • Added GPT2, Roberta tokenization models
  • Added hyphenation / syllabification APIs and a sample model: syllab
  • Added URL tokenization models: uri100k, uri250k, uri500k
  • Some small changes in the C# interface (it should be backwards compatible), uses Span instead of byte[] to allow on stack allocations of input and output buffers
BlingFire - blingfire pypi package v0.1.3

Published by SergeiAlonichau over 4 years ago

Four tokenization algorithms supported: patterns, word-piece, unigram lm, bpe. Added space normalization api, Added a few more popular models, added unigram lm tokenization models trained on uniformly represented ~84 languages from wikimatrix set. Bug fixes, parity fixes.