tokenizers

Elixir bindings for 🤗 Tokenizers

APACHE-2.0 License

Downloads
302.8K
Stars
85
Committers
8

Bot releases are visible (Hide)

tokenizers - v0.4.0 Latest Release

Published by github-actions[bot] about 1 year ago

Added

  • Support for training a tokenizer from scratch. See Tokenizers.Tokenizer.train_from_files/3
    and Tokenizers.Model for available models.

  • Support for changing tokenizer configuration, such as Tokenizers.Tokenizer.set_padding/2
    and Tokenizers.Tokenizer.set_truncation/2. See the "Configuration" functions group in
    Tokenizers.Tokenizer.

  • Support for apply multiple encoding transformations without additional data copies,
    see Tokenizers.Encoding.Transformation. Transformations can be passed to
    Tokenizers.Tokenizer.encode/3 via :encoding_transformations or applied via
    Tokenizers.Encoding.transform/2.

Changed

  • (Breaking) Tokenizers.Tokenizer.encode/3 no longer accepts a batch of inputs,
    to encode a batch use Tokenizers.Tokenizer.encode_batch/3 instead

  • (Breaking) Tokenizers.Tokenizer.decode/3 no longer accepts a batch of inputs,
    to encode a batch use Tokenizers.Tokenizer.decode_batch/3 instead

Full Changelog: https://github.com/elixir-nx/tokenizers/compare/v0.3.2...v0.4.0

Checksums

SHA256 list:

2d04e0b62b8d23b515ad33bf8a6fec4c8a01aa87f79516fee1b57ac39e96ec20  ex_tokenizers-v0.4.0-nif-2.15-x86_64-pc-windows-gnu.dll.tar.gz
dee3196c4908b8f56cb3080af38a7ed955873c685a2d8cbddde8fbc96e466220  ex_tokenizers-v0.4.0-nif-2.15-x86_64-pc-windows-msvc.dll.tar.gz
3615b939766fe439ff0d3fa939ad37b6a6059a8347f06a05657185eb2e0e5b51  ex_tokenizers-v0.4.0-nif-2.16-x86_64-pc-windows-gnu.dll.tar.gz
310c032a19d088520bd4d4644ae9f19039099a2d4143523771a7a5571f2f7117  ex_tokenizers-v0.4.0-nif-2.16-x86_64-pc-windows-msvc.dll.tar.gz
bbf8f8324804c40346cb5cd7a5addc31a931fc710ca4b42e01dd07d23b8eca10  libex_tokenizers-v0.4.0-nif-2.15-aarch64-apple-darwin.so.tar.gz
3bbf7a63ed4bda9a4390b5acba2d60e2ebd744f1d3f1d754f8ec4cc4f5eca6ff  libex_tokenizers-v0.4.0-nif-2.15-aarch64-unknown-linux-gnu.so.tar.gz
7dd6b547c95482518a15fb2a4bacdd751537a75ba7d8285ff5145904d55c4449  libex_tokenizers-v0.4.0-nif-2.15-aarch64-unknown-linux-musl.so.tar.gz
e1014519eb978cbc20649bb122a6c1f834629579ea0650a79bf8e816f58da6dd  libex_tokenizers-v0.4.0-nif-2.15-arm-unknown-linux-gnueabihf.so.tar.gz
a6456c8719c5b5914068378198a43d080a87221dd37ff2ccf1da05371ff028d7  libex_tokenizers-v0.4.0-nif-2.15-riscv64gc-unknown-linux-gnu.so.tar.gz
9007181e446c00a9113993b43b6522b6b28a098ae1154f3f964ba8fa3cbe0b8c  libex_tokenizers-v0.4.0-nif-2.15-x86_64-apple-darwin.so.tar.gz
061cef149b7e91f4556090c1df2672827c4e9d59e30befacd5167e4fd28c7098  libex_tokenizers-v0.4.0-nif-2.15-x86_64-unknown-linux-gnu.so.tar.gz
b2d9e65ccb6aeaaa4bbcf43d034c7af13e257770eaef55a0eb70f58b5e28ebc1  libex_tokenizers-v0.4.0-nif-2.15-x86_64-unknown-linux-musl.so.tar.gz
dfe7755fbad8a3409f3b5258e810220f2b0f179d8569dc7abcc489e65804d26e  libex_tokenizers-v0.4.0-nif-2.16-aarch64-apple-darwin.so.tar.gz
3bc9e5e23afaa2ed15680afd8f8e843434de804e8e708a5ca00c6a413a3f6be2  libex_tokenizers-v0.4.0-nif-2.16-aarch64-unknown-linux-gnu.so.tar.gz
987a81d6babb13b2baa9bfb8c602691766a6319548f4f129e0c99d56c6a155ae  libex_tokenizers-v0.4.0-nif-2.16-aarch64-unknown-linux-musl.so.tar.gz
b973a15035c495c27aa74e525342b2aff3be66c8f1f75f23776ea5f8dce59ba3  libex_tokenizers-v0.4.0-nif-2.16-arm-unknown-linux-gnueabihf.so.tar.gz
3a1cef66172ab7c82a255933a4d957d21d6b5fb26f6deea968de2598fd477f48  libex_tokenizers-v0.4.0-nif-2.16-riscv64gc-unknown-linux-gnu.so.tar.gz
e4a57f3e7a1dd29933fedab12b7c57d9a9cb1454d8fd6415bd82938d8cf64e3b  libex_tokenizers-v0.4.0-nif-2.16-x86_64-apple-darwin.so.tar.gz
6578bcaf43c24c449997354ac0439b0be5e4a27bf18de13e3c33929d514fe129  libex_tokenizers-v0.4.0-nif-2.16-x86_64-unknown-linux-gnu.so.tar.gz
043a3b36e463ea354cae656347b01b1988cae416fa46014c7c39b118fa9b36d0  libex_tokenizers-v0.4.0-nif-2.16-x86_64-unknown-linux-musl.so.tar.gz
tokenizers - v0.3.2

Published by github-actions[bot] over 1 year ago

What's Changed

Full Changelog: https://github.com/elixir-nx/tokenizers/compare/v0.3.1...v0.3.2

Checksums

SHA256 list:

0beb6a4514b33bd830ffb323b43aaca52adace65c3cb41d0101b9ea7c97fbe92  ex_tokenizers-v0.3.2-nif-2.15-x86_64-pc-windows-gnu.dll.tar.gz
b34c34dba1b1531f88d3e9ab2538b525724cf4e337f34c8a412f4e8e2c662079  ex_tokenizers-v0.3.2-nif-2.15-x86_64-pc-windows-msvc.dll.tar.gz
16b2c9b49d4e07da1c0563b44a491e759dbccb869ad019e43f41743604740f37  ex_tokenizers-v0.3.2-nif-2.16-x86_64-pc-windows-gnu.dll.tar.gz
b9f0a888f930d19022849fc67087fa34888f4e1add4eb6d7927a78b31af0ab33  ex_tokenizers-v0.3.2-nif-2.16-x86_64-pc-windows-msvc.dll.tar.gz
1f24f3c83ff4c80b4ba359c856a5cf399713576da3a7358f322d653c728511c5  libex_tokenizers-v0.3.2-nif-2.15-aarch64-apple-darwin.so.tar.gz
9ac8d641873c7effe2b004c2714769bb1084816b8ca785155ec91c7d8bcdb414  libex_tokenizers-v0.3.2-nif-2.15-aarch64-unknown-linux-gnu.so.tar.gz
2ee6be226626c39a86aa14857256984b0ee04a8ea8d26ea044db381373bf25ab  libex_tokenizers-v0.3.2-nif-2.15-aarch64-unknown-linux-musl.so.tar.gz
407cab208bd66b1aa244359daf90d76c06571e17707708fab3324614ca7cddb3  libex_tokenizers-v0.3.2-nif-2.15-arm-unknown-linux-gnueabihf.so.tar.gz
3cbae917a12934828d5c7b2f33a2e96cb49c01bc8d43dd95b1c8173efd615d44  libex_tokenizers-v0.3.2-nif-2.15-riscv64gc-unknown-linux-gnu.so.tar.gz
23246e6f4e40c3b456450c6ea9c8eaf3c2b4854e8e384c3d38dc387e7cb2490a  libex_tokenizers-v0.3.2-nif-2.15-x86_64-apple-darwin.so.tar.gz
237573833e93a49bc0ed1ba88bec9eaee91e5090b1db218bfbc0272670fd6502  libex_tokenizers-v0.3.2-nif-2.15-x86_64-unknown-linux-gnu.so.tar.gz
a9d07c4c2b6336dfaeda7ff915d4ba1500ac0fc7e30bfc6a18cf46f1e660e83b  libex_tokenizers-v0.3.2-nif-2.15-x86_64-unknown-linux-musl.so.tar.gz
ad4cf83c29299bd045898ddb86f0cdc1e39d7ca70a320d93e0b91179da707251  libex_tokenizers-v0.3.2-nif-2.16-aarch64-apple-darwin.so.tar.gz
c144c7e9bef8427bffac2b2744cebe7ca293454edcad003a6e37ce051e0778b7  libex_tokenizers-v0.3.2-nif-2.16-aarch64-unknown-linux-gnu.so.tar.gz
f8c6a0ffa01f6873649d01f71a17ad771934a3ee36249f11cdfef57ac564b4ee  libex_tokenizers-v0.3.2-nif-2.16-aarch64-unknown-linux-musl.so.tar.gz
c3e4156c7b7007807064ecac3392dba696252c1838658f3fbe7c453d943d4c3b  libex_tokenizers-v0.3.2-nif-2.16-arm-unknown-linux-gnueabihf.so.tar.gz
21d70308a4906397243f50eaefa2b0da2035047f82e5a973916d4fff7a38cacc  libex_tokenizers-v0.3.2-nif-2.16-riscv64gc-unknown-linux-gnu.so.tar.gz
c458f5d75509edcb1822796777a1c679a9dcb45228d55b92436bed81bffe89f7  libex_tokenizers-v0.3.2-nif-2.16-x86_64-apple-darwin.so.tar.gz
6eaf8d44a29bc8674e236e3812c40a9682896e021c7fc5cc50b99535846f8864  libex_tokenizers-v0.3.2-nif-2.16-x86_64-unknown-linux-gnu.so.tar.gz
aad218c6de546475ab51767802357c9d50bbc9b8ff4197006b47e44b01f405b4  libex_tokenizers-v0.3.2-nif-2.16-x86_64-unknown-linux-musl.so.tar.gz
tokenizers - v0.3.1

Published by philss over 1 year ago

Added

  • Add binary variants for accessing encoding data. This way we can convert encoding
    data to tensors without additional allocations. The following functions were added:

    • get_u32_ids/1
    • get_u32_attention_mask/1
    • get_u32_type_ids/1
    • get_u32_special_tokens_mask/1

Pull requests

Full Changelog: https://github.com/elixir-nx/tokenizers/compare/v0.3.0...v0.3.1

Checksums

Here is the SHA256 checksums of each precompiled file:

d27eed0e8395f4065721f970a7a6f109d3f54d1b36c850d0a14844082ab90c8d  ex_tokenizers-v0.3.1-nif-2.15-x86_64-pc-windows-gnu.dll.tar.gz
0f5981b8affa087be4384f0fc9de444271556662b289333748e7a4d25f2d00c7  ex_tokenizers-v0.3.1-nif-2.15-x86_64-pc-windows-msvc.dll.tar.gz
d110ebf8a1c7e43487eb9c224dae456e72596618c0c614eb617990a47f203804  ex_tokenizers-v0.3.1-nif-2.16-x86_64-pc-windows-gnu.dll.tar.gz
7d140750c9b927434b179ddf33efd8bb77428f17556ebd8d1d8d07212c19c76a  ex_tokenizers-v0.3.1-nif-2.16-x86_64-pc-windows-msvc.dll.tar.gz
6f15a001864f3ad5911c9a75b178ead798ed66c2325fb825588bde3b36ea5c72  libex_tokenizers-v0.3.1-nif-2.15-aarch64-apple-darwin.so.tar.gz
5ba97d7fa56763b5d5235a77bfc26169baabb5aa3c7a04e2583f9b1ec92e84a3  libex_tokenizers-v0.3.1-nif-2.15-aarch64-unknown-linux-gnu.so.tar.gz
c86df80018eef4fd74174bc484401a71a06a50bacdd26478129a51d1750555f4  libex_tokenizers-v0.3.1-nif-2.15-aarch64-unknown-linux-musl.so.tar.gz
e5b7f21328f5c203fa9da3c140f10e34e034cc623f1d6c681193fd5b7724738a  libex_tokenizers-v0.3.1-nif-2.15-arm-unknown-linux-gnueabihf.so.tar.gz
9de202cf8ee208ab6f43f7474e8614918fd2a434b725444ec1741ded4a8e78bc  libex_tokenizers-v0.3.1-nif-2.15-riscv64gc-unknown-linux-gnu.so.tar.gz
58288eb84284a6c692df5789e3ca238f89a3f7858ddce33095c0a3cee47e2060  libex_tokenizers-v0.3.1-nif-2.15-x86_64-apple-darwin.so.tar.gz
64ce2cda9b2631968ba3aeee86574e62f2e33077cd8aa65137a700d6a03de6de  libex_tokenizers-v0.3.1-nif-2.15-x86_64-unknown-linux-gnu.so.tar.gz
32334ac268196e708c79dea0d5ee4c4adbc59716693454fe5d88fb302b5bfd30  libex_tokenizers-v0.3.1-nif-2.15-x86_64-unknown-linux-musl.so.tar.gz
2fefb0ef7cc98438332a8effd7788b258f245f7f2e7e9dfe13c4f61b5ce23e01  libex_tokenizers-v0.3.1-nif-2.16-aarch64-apple-darwin.so.tar.gz
a0761c7e6dc035180ea42fb15d4a6ad957bd33e5ae363a4ef04ed444bc523bb8  libex_tokenizers-v0.3.1-nif-2.16-aarch64-unknown-linux-gnu.so.tar.gz
6eb0686cf9dc77df45dbd284d7b63be4b62798b2dd930b53cc7ca0723da883aa  libex_tokenizers-v0.3.1-nif-2.16-aarch64-unknown-linux-musl.so.tar.gz
4aebed75ee0ad016877d8c64f1c841b25d9e473132dc7fa80842b01a3a42b8fc  libex_tokenizers-v0.3.1-nif-2.16-arm-unknown-linux-gnueabihf.so.tar.gz
615099e92676e1470f9b88aff8aa00f21fee1c568c80df57c26a934b55d042db  libex_tokenizers-v0.3.1-nif-2.16-riscv64gc-unknown-linux-gnu.so.tar.gz
0b3b366e06b87cc1252d90f66af73482d89e2fa4c1a7faad235ac342d83490e3  libex_tokenizers-v0.3.1-nif-2.16-x86_64-apple-darwin.so.tar.gz
b1dd8be7b57f1929ada728b1049e652b2c1ab5c100f766ae53dcd30093926a3b  libex_tokenizers-v0.3.1-nif-2.16-x86_64-unknown-linux-gnu.so.tar.gz
a0d3ed11d8287b674279695dd9ea61f10f61e9661eaa5d5837b90bce6efed612  libex_tokenizers-v0.3.1-nif-2.16-x86_64-unknown-linux-musl.so.tar.gz
tokenizers - v0.3.0

Published by github-actions[bot] over 1 year ago

Added

  • Add option to use cache when downloading pretrained files. We check the ETAG of
    the file before trying to download it. This introduces the :use_cache and :cache_dir
    options to the Tokenizers.from_pretrained/2 function.

  • Support adding special tokens when creating a tokenizer. This allows a pretrained
    tokenizer to be loaded with additional special tokens.

    This change adds the :additional_special_tokens option to the Tokenizers.from_pretrained/2
    function.

  • Add support for the riscv64gc-unknown-linux-gnu target, which is useful for Nerves
    projects running on 64 bits RISC-V computers.
    This means that we are precompiling the project to run on those machines.

Changed

  • Change minimum required version of Rustler Precompiled to ~> 0.6. With this, we have
    the aarch64-unknown-linux-musl and riscv64gc-unknown-linux-gnu as default targets.
    But we also drop support for the NIF version 2.14.

Pull requests

New Contributors

Full Changelog: https://github.com/elixir-nx/tokenizers/compare/v0.2.0...v0.3.0
Official changelog: https://github.com/elixir-nx/tokenizers/blob/main/CHANGELOG.md

Checksums

Here is the list of SHA256 checksums of the precompiled files:

e73178ccbea2e63b7b86afcbcff1a01a10e2b69901f424e703e8a38ff74c1dcf  ex_tokenizers-v0.3.0-nif-2.15-x86_64-pc-windows-gnu.dll.tar.gz
a562cac8feb8b3964860a461897be64d27a2a999f8cb237334f552ad1e14ff8a  ex_tokenizers-v0.3.0-nif-2.15-x86_64-pc-windows-msvc.dll.tar.gz
da32956b0346021376fd14e2c484c1150b1ed197577d5c8f99193b3b1c815ae2  ex_tokenizers-v0.3.0-nif-2.16-x86_64-pc-windows-gnu.dll.tar.gz
aaf9fbd3ffbfced33e7871f3f036067156433fd935f0b42778f47982aeee4717  ex_tokenizers-v0.3.0-nif-2.16-x86_64-pc-windows-msvc.dll.tar.gz
9937b4a50fbd03e48483484b09aa95446a4b4a67eb07e74f480193eaac73087a  libex_tokenizers-v0.3.0-nif-2.15-aarch64-apple-darwin.so.tar.gz
56038e1045c674c3a321a5bade5467c0cb599ede1a58df4fd1428f9eefe979b9  libex_tokenizers-v0.3.0-nif-2.15-aarch64-unknown-linux-gnu.so.tar.gz
bd1b27a026f3f8f5b0d60a7a2415bb19823fc5a726b173da24ca8fc1534e49f4  libex_tokenizers-v0.3.0-nif-2.15-aarch64-unknown-linux-musl.so.tar.gz
52e527a66d2806321c2297362fbac27429fb8e60f29386637c66ce78e8dadcf3  libex_tokenizers-v0.3.0-nif-2.15-arm-unknown-linux-gnueabihf.so.tar.gz
f43190147eafbc812607b61c9459a6c47aa526c492c13e2157cd613e382ba35e  libex_tokenizers-v0.3.0-nif-2.15-riscv64gc-unknown-linux-gnu.so.tar.gz
36a6d6691a3b3fa6d56ddf37fa2f696ae79bf2b991c17ee7cccbb2b9dda1719b  libex_tokenizers-v0.3.0-nif-2.15-x86_64-apple-darwin.so.tar.gz
f48f95b1a5373f75a555a78899916acb48d1c003795a02b0ee0001b7c92c8a51  libex_tokenizers-v0.3.0-nif-2.15-x86_64-unknown-linux-gnu.so.tar.gz
e79bac2f303dafdf7b8c7946c77a5f7be2e90db40f3102d1bdd3929d309fc949  libex_tokenizers-v0.3.0-nif-2.15-x86_64-unknown-linux-musl.so.tar.gz
5fdc9b12dcdc0eaf6ac7ba8a3608af33abe902299f81ec953fd7eac6de8477de  libex_tokenizers-v0.3.0-nif-2.16-aarch64-apple-darwin.so.tar.gz
0e40ed777f14a41df64526b50224392723eabd83188fa79910c6baceca4c72f1  libex_tokenizers-v0.3.0-nif-2.16-aarch64-unknown-linux-gnu.so.tar.gz
287d5896f09562c25527105998f1468f97649fb7a8fec60a04ad53d113a5d9df  libex_tokenizers-v0.3.0-nif-2.16-aarch64-unknown-linux-musl.so.tar.gz
dc8fdaa04935d32ab6a0cc00d06c1f3b747eb08eb09ab8b9d2143157c2de436e  libex_tokenizers-v0.3.0-nif-2.16-arm-unknown-linux-gnueabihf.so.tar.gz
db79c2f522a4b39940d98550b29b47e9f8ba45c1d3f3a00a19278a3b977c65b7  libex_tokenizers-v0.3.0-nif-2.16-riscv64gc-unknown-linux-gnu.so.tar.gz
b343d1dd3e2467e4e54624bbd8ff9dcefe48dd23ddd59f8ab54e808cc19a5865  libex_tokenizers-v0.3.0-nif-2.16-x86_64-apple-darwin.so.tar.gz
3850d19c6b2e635e46475a0ee0cc6c3dc62b0325a8c232fdf1137cc98e48af9d  libex_tokenizers-v0.3.0-nif-2.16-x86_64-unknown-linux-gnu.so.tar.gz
bdfeb7816dec218b04a024d2163f1fe7080eaca6e9b18823c3ac8f9cef3ecfbc  libex_tokenizers-v0.3.0-nif-2.16-x86_64-unknown-linux-musl.so.tar.gz
tokenizers - v0.2.0

Published by cigrainger almost 2 years ago

Adds a minimal http server to avoid problems with openssl

tokenizers - v0.1.2

Published by seanmor5 about 2 years ago

Updates Rustler Precompiled to be configured with musl as a build target

tokenizers - v0.1.1

Published by seanmor5 about 2 years ago

Adds linux-musl target to build matrix

tokenizers - v0.1.0

Published by github-actions[bot] about 2 years ago