Unofficial python bindings for the rust llm library. 🐍❤️🦀
MIT License
Bot releases are hidden (Show)
Published by LLukas22 about 1 year ago
Adds the ability to extend the context length of models via the RoPE_scaling
parameter.
Published by LLukas22 over 1 year ago
Simplified the interaction with other GGML based repos. Like TheBloke/Llama-2-7B-GGML created by TheBloke.
Published by LLukas22 over 1 year ago
Fixed many gpu acceleration bugs in rustformers\llm
and improved performance to match native ggml
.
Published by LLukas22 over 1 year ago
Adds support for Metal/CUDA and OpenCL acceleration for LLama-based models.
Adds CI for the different acceleration backends to create prebuild binaries
Published by LLukas22 over 1 year ago
gpt2
architecturePublished by LLukas22 over 1 year ago
Published by LLukas22 over 1 year ago
AutoModel
compatible models will now use the official tokenizers
library, which improves the decoding accuracy, especially for all non llama based models.
If you want to specify a tokenizer manually, it can be set via the tokenizer_path_or_repo_id
parameter. If you want to use the default GGML tokenizer the huggingface support can be disabled via use_hf_tokenizer
.
Published by LLukas22 over 1 year ago
Published by LLukas22 over 1 year ago
Added support for q5_0
,q5_1
and q8_0
formats.
Published by LLukas22 over 1 year ago
Added the stream
method to each model, which returns a generator that can be consumed to generate a response.
Published by LLukas22 over 1 year ago
⚠️ The GGML quantization format was updated again, old models will be incompatible ⚠️
Published by LLukas22 over 1 year ago
AutoModel
can now automatically download GGML converted models and normal Transformer models from the Huggingface Hub.
Published by LLukas22 over 1 year ago
Added the ability to automatically convert any supported model from the Huggingface Hub via the AutoConverter
.
Models which were converted this way, can be easily quantized or loaded via the AutoQuantizer
or AutoModel
without the need to specifiy the architecture.
Published by LLukas22 over 1 year ago
The ability to quantize models is now available for every architecture via quantize
.
Published by LLukas22 over 1 year ago
Published by LLukas22 over 1 year ago
Added the tokenize
and decode
functions to each model, to enable access to the internal tokenizer.
The generation of tokens is now GIL free, meaning other background threads can run at the same time.
Published by LLukas22 over 1 year ago
Since llama-rs
was renamed to llm
and now supports multiple model architectures, this wrapper was also expanded to support the new trait system and library structure.
Supported architectures for now:
The loader was also reworked and now supports the mmap-able ggjt
. To support this the SessionConfig
was expandend with the prefer_mmap
field.
Published by LLukas22 over 1 year ago
Published by LLukas22 over 1 year ago