llm-rs-python | Llama Ecosystem Directory

Bot releases are hidden (Show)

llm-rs-python - Custom RoPE support & Small Langchain bugfixes Latest Release

Published by LLukas22 about 1 year ago

Adds the ability to extend the context length of models via the RoPE_scaling parameter.

llm-rs-python - Better HuggingfaceHub Integration

Published by LLukas22 over 1 year ago

Simplified the interaction with other GGML based repos. Like TheBloke/Llama-2-7B-GGML created by TheBloke.

llm-rs-python - Stable GPU Support

Published by LLukas22 over 1 year ago

Fixed many gpu acceleration bugs in rustformers\llm and improved performance to match native ggml.

llm-rs-python - Experimental GPU support

Published by LLukas22 over 1 year ago

Adds support for Metal/CUDA and OpenCL acceleration for LLama-based models.

Adds CI for the different acceleration backends to create prebuild binaries

llm-rs-python - Added 🌾🔱 Haystack Support + BigCode-Models

Published by LLukas22 over 1 year ago

Added support for the haystack library
Support "BigCode" like models (e.g. WizardCoder) via the gpt2 architecture

llm-rs-python - Added 🦜️🔗 LangChain support

Published by LLukas22 over 1 year ago

llm-rs-python - Added Huggingface Tokenizer Support

Published by LLukas22 over 1 year ago

AutoModel compatible models will now use the official tokenizers library, which improves the decoding accuracy, especially for all non llama based models.

If you want to specify a tokenizer manually, it can be set via the tokenizer_path_or_repo_id parameter. If you want to use the default GGML tokenizer the huggingface support can be disabled via use_hf_tokenizer.

llm-rs-python - Fixed GPT-J quantization

Published by LLukas22 over 1 year ago

llm-rs-python - Added other quantization formats

Published by LLukas22 over 1 year ago

Added support for q5_0,q5_1 and q8_0 formats.

llm-rs-python - Streaming support

Published by LLukas22 over 1 year ago

Added the stream method to each model, which returns a generator that can be consumed to generate a response.

llm-rs-python - GGML quantization update

Published by LLukas22 over 1 year ago

⚠️ The GGML quantization format was updated again, old models will be incompatible ⚠️

llm-rs-python - Huggingface Hub integrations into AutoModel

Published by LLukas22 over 1 year ago

AutoModel can now automatically download GGML converted models and normal Transformer models from the Huggingface Hub.

llm-rs-python - AutoConverter, AutoQuantizer and AutoModel

Published by LLukas22 over 1 year ago

Added the ability to automatically convert any supported model from the Huggingface Hub via the AutoConverter.

Models which were converted this way, can be easily quantized or loaded via the AutoQuantizer or AutoModel without the need to specifiy the architecture.

llm-rs-python - Added quantization support

Published by LLukas22 over 1 year ago

The ability to quantize models is now available for every architecture via quantize.

llm-rs-python - LoRA & MPT Support

Published by LLukas22 over 1 year ago

Added support for Mosaic ML's MPT models.
Added support for LoRA adapters for all architectures.

⚠️Caution⚠️
Due to changes in the ggml format old quantized models are not supported anymore!

llm-rs-python - Tokenization & GIL free generation

Published by LLukas22 over 1 year ago

Added the tokenize and decode functions to each model, to enable access to the internal tokenizer.

The generation of tokens is now GIL free, meaning other background threads can run at the same time.

llm-rs-python - Support Multiple Model Architectures

Published by LLukas22 over 1 year ago

Since llama-rs was renamed to llm and now supports multiple model architectures, this wrapper was also expanded to support the new trait system and library structure.

Supported architectures for now: