Pre-trained Neural Network models in Axon (+ 🤗 Models integration)
APACHE-2.0 License
:no_repeat_ngram_length
when using lower precisionPublished by jonatanklosko 8 months ago
Published by jonatanklosko 8 months ago
Published by jonatanklosko 8 months ago
This release changes the directory structure of the models cache, such that cached files from the same HuggingFace Hub repository are grouped in a separate subdirectory. This change is meant to simplify the process of manually removing specific models from the cache to free up space. As a result, the cache contents from prior versions are invalidated, so you most likely want to remove the current cache contents. To find the cache location run elixir -e 'Mix.install([{:bumblebee, "0.4.2"}]); IO.puts(Bumblebee.cache_dir())'
(defaults to the standard cache location for the given operating system).
We also reduced memory usage during parameter loading (both when loading onto the CPU and GPU directly). Previously, larger models sometimes required loading parameters using CPU and only then transfering to the GPU, in order to avoid running out of GPU memory during parameter transformations. With this release this should no longer be the case. Loading parameters now has barely any memory footprint other than the parameters themselves.
:params_filename
for Stable Diffusion models is no longer necessary) (#301):seed
option to generation serving inputs (#303):params_variant
option to Bumblebee.load_model/2
for loading parameters of different precision (#309):type
option to Bumblebee.load_model/2
for loading model under a specific precision policy (#311):spec_overrides
option to Bumblebee.load_model/2
(#340)Bumblebee.apply_tokenizer/3
, these should now be set on the tokenizer using Bumblebee.configure/2
(#310):preallocate_params
serving option is enabled (#317):for_masked_image_modeling
output from :logits
to :pixel_values
:text_embeddings
and :image_embeddings
to singular:pooled_state
output to flatten the extra 1-sized axesBumblebee.Text.Generation.build_generate/4
to a map (#336):seed
option in favour of a runtime, per-input seed (#303)Bumblebee.Audio.speech_to_text/5
(in favour of the more specific speech_to_text_whisper/5
)Published by jonatanklosko about 1 year ago
Bumblebee.Audio.speech_to_text/5
in favour of the more specific Bumblebee.Audio.speech_to_text_whisper/5
Nx.BinaryBackend
Published by jonatanklosko about 1 year ago
Bumblebee.cache_dir/0
for discovering cache location (#220):preallocate_params
option to all servings, useful with multiple GPUs (#233)Published by jonatanklosko over 1 year ago
In this release we moved all generation options to a new %Bumblebee.Text.GenerationConfig{}
struct, which needs to be explicitly loaded and configured. A number of generation options is model-specific and they used to be a part of model specification, but encapsulating everything in a single struct improves the transparency of options origin and reconfiguration. The text generation servings (generation, speech-to-text and conversation) need to be adjusted as follows:
{:ok, model_info} = Bumblebee.load_model({:hf, "gpt2"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "gpt2"})
+{:ok, generation_config} = Bumblebee.load_generation_config({:hf, "gpt2"})
+generation_config = Bumblebee.configure(generation_config, max_new_tokens: 100)
+serving = Bumblebee.Text.generation(model_info, tokenizer, generation_config)
-serving = Bumblebee.Text.generation(model_info, tokenizer, max_new_tokens: 100)
Published by jonatanklosko over 1 year ago
load_model
(#140)Published by jonatanklosko almost 2 years ago
Initial release.