LLM training code for Databricks foundation models
APACHE-2.0 License
ICL datasets have now been added as a registry.
You can now switch dataloaders while training which enables curriculum learning.
train_loader:
<dataloader parameters>
callback:
curriculum_learning:
- duration: <number>tok
train_loader: # matches top level train_loader
<dataloader parameters>
- duration: <number>tok
train_loader:
<dataloader parameters>
- duration: <number>tok
train_loader:
<dataloader parameters>
You can now override default block configs for certain layers, allowing for different sliding window sizes, reusing the previous layer's kv cache, etc.
model:
...
(usual model configs)
...
block_overrides:
order:
- name: default
- order:
- name: sliding_window_layer
- name: sliding_window_layer_reuse
- name: sliding_window_layer
- repeat: 2
name: sliding_window_layer_reuse
- name: reuse_kv_layer
repeat: 2
overrides:
sliding_window_layer:
attn_config:
sliding_window_size: 1024
sliding_window_layer_reuse:
attn_config:
sliding_window_size: 1024
reuse_kv_layer_idx: -1 # Relative index of the layer whose kv cache to reuse
reuse_kv_layer:
attn_config:
reuse_kv_layer_idx: -6 # Relative index of the layer whose kv cache to reuse
all
transforms to train script by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1300
Full Changelog: https://github.com/mosaicml/llm-foundry/compare/v0.9.1...v0.10.0
Published by dakinggg 4 months ago
This is a minor patch release to bump the minimum version of mlflow to make sure to buffer writes (https://github.com/mosaicml/composer/pull/3401)
Full Changelog: https://github.com/mosaicml/llm-foundry/compare/v0.9.0...v0.9.1
Published by KuuCi 5 months ago
We've expanded the different ways to encode token IDs by allowing uint32 and uint16 formats, which saves significant space for datasets with smaller vocab sizes. We also extended ndarray type support for MDS dataset columns to the generic text dataset and updated conversion scripts accordingly.
We've implemented stricter enforcement on our Train and Eval configs to further protect users from attempting to train with invalid configs. In conjunction with numerous other PRs, we have stronger error handling to help users use LLM Foundry smoothly.
Previously, this was allowed:
parameters:
train_dataloader:
...
seed: ${global_seed}
random_other_key_that's_not_in_the_dataloader_constructor # this is not allowed
...
global_seed: 17 # this is also not allowed
But we've added a variables section. Please do this instead:
parameters:
variables:
global_seed: 42
...
train_dataloader:
seed: ${variables.global_seed}
We've updated our text to mds to convertion script to convert files to MDS in chunks. This protects us from loading entire large files at once (potentially causing OOMs), and drastically speeds up converting long sequences.
fc_type
a dict to pass fc kwargs through by @snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1201
Full Changelog: https://github.com/mosaicml/llm-foundry/compare/v0.8.0...v0.9.0
Published by milocress 6 months ago
Support for training optimized MoE models at large scale.
Check out the megablocks documentation for more information on building state of the art MoE models.
We've expanded support for registries to include, dataloaders, FFN layers, attention layers, norms, and parameter initialization functions.
Check out the README for detailed instructions and code examples!
We now support the ShareGPT format for finetuning.
We have updated the minimum supported PyTorch version to torch 2.3 (#1152).
We've removed the code_evaluation
task from the allowed in context learning task types, and we've deleted the InContextLearningCodeEvaluationDataset
and InContextLearningCodeEvalAccuracy
classes.
We've removed the question_answering
task type. Please use the generation_task_with_answers
task instead.
.json
to SUPPORTED_EXTENSIONS by @eitanturok in https://github.com/mosaicml/llm-foundry/pull/1114
llmfoundry.data.utils.get_text_collator
by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1170
Full Changelog: https://github.com/mosaicml/llm-foundry/compare/v0.7.0...v0.8.0
Published by irenedea 7 months ago
LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series.
In addition to the usual bug fixes and performance improvements, we've made foundry more customizable and extensible!
We've made key components of LLM Foundry registrable, such as models, loggers, and callbacks. You can use the registry to easily customize and extend your training workflows.
This means that you can register new options for these components, and then use them in your yaml config.
Check out the README for detailed instructions and code examples!
We've removed support for deprecated features: triton attention, Prefix LMs, Llama attention patch, z-loss, and text denoising. These features were little used, and we removed them to focus on the core features that are heavily used.
If you were using these features please let us know how you were using them in a GitHub issue. We're happy to add things back that are in heavy usage.
Full Changelog: https://github.com/mosaicml/llm-foundry/compare/v0.6.0...v0.7.0
Published by dakinggg 7 months ago
LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series.
In addition to the usual bug fixes and performance improvements, we've added lots of new features!
For chat-formatted data, you can now specify which tokens should be loss-generating in a configurable way.
This can be specified in the train_loader.dataset
section of your yaml as follows:
...
train_loader:
dataset:
...
target_prompts: <FILL IN>
target_reseponses: <FILL IN>
See the docstring for a description of the options.
We've added support for the OLMo model from AI2.
To use OLMo, there are a few configuration parameters you need to set. First of all, you will need to install LLM Foundry with the extra package for OLMo (pip install .[gpu,olmo]
).
Then you will need to adjust the tokenizer section of your config as follows:
tokenizer:
name: allenai/OLMo-7B
kwargs:
revision: main
model_max_length: 2048
model_input_names:
- input_ids
- attention_mask
trust_remote_code: true
We've added a new, on-by-default metric to compute token accuracy in addition to cross entropy and perplexity.
More configurable activation checkpointing for MPT allows finer granular control over memory usage when training MPT. See the docstring for more details.
We've brought the finetuning dataloader up to speed with the pretraining dataloader to support mixing multiple streams, and pretokenizing finetuning data. See the yaml for a full example.
We've release v0.3 of our Evaluation gauntlet. See the README for a full description.
Support for flash attention v1 has now been removed.
When tokenizing prompt/response and chat data, for some tokenizers, we were mistakenly adding an extra BOS token between the prompt and the response. This has now been removed.
We've deprecated use of the triton version of flash attention, prefixLM, and text denoising, as these features were not heavily used or actively maintained.
MemoryMonitor
takes in kwargs. by @snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1020
Full Changelog: https://github.com/mosaicml/llm-foundry/compare/v0.5.0...v0.6.0
Published by irenedea 9 months ago
LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series.
In addition to the usual bug fixes and performance improvements, we've added lots of new features!
LLM Foundry now supports LoRA via an integration with the PEFT library. Within LLM Foundry, run train.py
, adding peft_config arguments to the model section of the config .yaml
, like so:
model:
...
peft_config:
r: 16
peft_type: LORA
task_type: CAUSAL_LM
lora_alpha: 32
lora_dropout: 0.05
target_modules:
- q_proj
- k_proj
Read more about it in the tutorial.
We've added support for using ALiBi with Flash Attention (v2.4.2 or higher).
model:
...
attn_config:
attn_impl: flash
alibi: True
We now support finetuning on chat data, with automatic formatting applied using Hugging Face tokenizer chat templates.
Each sample requires a single key "messages"
that maps to an array of message objects. Each message object in the array represents a single message in the conversation and must contain the following keys:
role
: A string indicating the author of the message. Possible values are "system"
,"user"
, and "assistant"
.content
: A string containing the text of the message.We require that there must be at least one message with the role "assistant", and the last message in the "messages" array must have the role "assistant" .
Here's an example .jsonl
with chat data:
{ "messages": [ { "role": "user", "content": "Hi, MPT!" }, { "role": "assistant", "content": "Hi, user!" } ]}
{ "messages": [
{ "role": "system": "A conversation between a user and a helpful and honest assistant"}
{ "role": "user", "content": "Hi, MPT!" },
{ "role": "assistant", "content": "Hi, user!" },
{ "role": "user", "content": "Is multi-turn chat supported?"},
{ "role": "assistant", "content": "Yes, we can chat for as long as my context length allows." }
]}
...
We now provide a safe_load
option when loading HuggingFace datasets for finetuning.
This restricts loaded files to .jsonl
, .csv
, or .parquet
extensions to prevent arbitrary code execution.
To use, set safe_load
to true
in your dataset configuration:
train_loader:
name: finetuning
dataset:
safe_load: true
...
As always, we've updated to new versions of the core dependencies of LLM Foundry, bringing better performance, new features, and support for new models (mixtral in particular).
Will be removed in v0.6.0.
We no longer support PyTorch versions before 2.1.
We've removed features that have been deprecated for at least one release.
sync_module_states: True
when using HSDP by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/830
datasets
version by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/892
tokenizer-only
flag to only download tokenizers from HF or oras by @irenedea in https://github.com/mosaicml/llm-foundry/pull/895
add_metrics_to_eval_loaders
function to accept list of metric names instead of a dictionary of metrics. by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/938
Full Changelog: https://github.com/mosaicml/llm-foundry/compare/v0.4.0...v0.5.0
Published by dakinggg 11 months ago
LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT-7B and MPT-30B models.
In addition to the usual bug fixes and performance improvements, we've added lots of new features!
You can now specify packing_ratio: auto
under your finetuning dataset, to automatically profile and select a good packing ratio to efficiently pack your sequences together on the fly during finetuning. This can dramatically reduce the amount of compute wasted on padding tokens.
We now support using Flash Attention 2 both in MPT and in any model that supports Flash Attention 2 via the Transformers library. See the training instructions to learn how to use the different versions of Flash Attention.
As always, we've updated to new versions of the core dependencies of LLM Foundry, bringing better performance, new features, and support for new models (codellama and mistral in particular).
We've made it much easier to go from a training run to a served model using Databricks model serving. To make use of this feature, you need to specify both an MLFlowLogger
and a HuggingFaceCheckpointer
for your run.
The MLFlowLogger
should have a Unity Catalog model registry prefix in the form of catalog.schema
. This specifies where to register your models to. For example,
loggers:
mlflow:
experiment_name: /Users/[email protected]/my_experiment_name,
tracking_uri: databricks,
model_registry_prefix: catalog.schema,
model_registry_uri: databricks-uc,
The HuggingFaceCheckpointer
should specify the name you want to register the model under. For example,
callbacks:
hf_checkpointer:
save_interval: 1ep # Save Hugging Face formatted checkpoints each epoch
save_folder: s3://bucket/path/to/my/checkpoints
mlflow_registered_model_name: my_model_name # Final model will be registered to catalog.schema.my_model_name
We've added a few new options when training with the MPT architecture in LLM Foundry.
We've released v0.1 of our Eval Gauntlet (#674, #748)! This adds many new benchmarks, chain-of-thought, and a new safety category. Check out the README for full details!
In addition, we've made a few improvements to our evaluation options, with more to come!
Added H100 profiling results to our benchmarking table.
Generate
callback to log generations from your model over the course of training. (#631)snapshot_download
to download models from the Hugging Face Hub. (#708)We've added experimental support for the inverse square root learning rate scheduler.
We've upgraded to the latest Streaming version, including vastly improved default settings for partitioning and shuffling. This means that if you were using the defaults, you will get different results after upgrading. The new defaults should be more performant for the large majority of use cases. See the Streaming release notes for more details.
We occasionally remove unused experimental parts of the code base to focus on new features and better support for existing features, and we've removed support for PrefixLM applied to Bloom and OPT models in this release.
load_strict_model_weights
as an optional config parameter by @AllenHW in https://github.com/mosaicml/llm-foundry/pull/655
tie_word_embeddings
config setting to enable / disable weight tied embeddings by @vchiley in https://github.com/mosaicml/llm-foundry/pull/728
Full Changelog: https://github.com/mosaicml/llm-foundry/compare/v0.3.0...v0.4.0
Published by dakinggg about 1 year ago
LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series. This release includes lots of bug fixes, stability improvements, and improved error messages, in addition to all the new features listed below!
Adds support for training Llama-2 models with optimized flash attention. To enable flash attention, set the attention_patch_type
in your yaml like so:
model:
...
attention_patch_type: triton
...
See the example yaml for a full example of how to finetune Llama-2 on the MosaicML platform.
We have implemented an 8-bit version of the Lion optimizer. This reduces the memory needed per parameter from 12 bits to 9 bits. To switch from Lion to 8-bit Lion, simply change the optimizer name from decoupled_lionw
to decoupled_lionw_8b
!
We've greatly improved our utilities for checkpoint conversion, including generalizing the Composer to Hugging Face conversion script to support all causal LMs, adding a callback to perform the conversion to Hugging Face format during the training job, and support for Faster Transformer conversion from a Composer MPT checkpoint.
To enable the new callback, add the hf_checkpointer
callback to your yaml like so:
callbacks:
...
hf_checkpointer:
# Save a Hugging Face formatted checkpoint at the end of each epoch
save_interval: 1ep
# The Hugging Face formatted checkpoints will be saved inside a subfolder called huggingface,
# so this folder will likely be the same as your overall save_folder
save_folder: ./{run_name}/checkpoints
# Set the precision you want the checkpoint saved in
precision: bfloat16
We have added support for running HumanEval (code evaluation) using LLM Foundry! See the evaluation readme for a more detailed description and the tasks yaml for an ICL yaml that can be used to run the HumanEval evaluation task.
Adds support for using NVIDIA's Transformer Enginer to enable FP8 training. To enable, set fc_type='te'
and/or ffn_config['ffn_type']='te_ln_mlp'
and precision='amp_fp8'
.
Adds support for using MLFlow as an experiment tracker. To enable, simply add mlflow
to the loggers
section of your yaml. See the Composer docs for more configuration options for MLFlow. Stay tuned for automatic model logging to MLFlow for easy deployment.
Updates to the latest release of MosaicML Streaming and sets better defaults for improved shuffling quality and training throughput. Check out the Streaming release notes for the full details of all the new options!
Implements Grouped Query Attention, which can strike a good balance between the quality of Multi Head Attention and the speed of Multi Query Attention. To enable, set attn_config['attn_type']='grouped_query_attention'
and attn_config['kv_n_heads']
to the desired number of kv heads.
Thanks to @tdoublep and @lorabit110 for making MPT a bit easier to use with other parts of the NLP ecosystem!
Improvements to our evaluation setup, including the ability to run the eval gauntlet during training, and a wrapper to allow using inference APIs with our eval gauntlet. The ICL tasks and gauntlet can be specified as shown [here](https://github.com/mosaicml/llm-foundry/blob/fd36398dad5ac9fde085af679514189ce9439be4/scripts/eval/yamls/hf_eval.yaml#L46-L47.
We have enabled training with tiktoken tokenizers with a thin wrapper around the tiktoken library for compatibility with all the tooling built around Hugging Face tokenizers. You can enable this with a simple change to the tokenizer section of your yaml:
tokenizer:
name: tiktoken
kwargs:
model_name: gpt-4
Allows the use of our evaluation script with a model trained using LoRA. Coming soon, full support for LoRA with FSDP! See this yaml for an example of evaluating a model trained using LoRA. Stay tuned for full LoRA support with FSDP!
Lastly, we are building a finetuning API on top of LLM Foundry, Composer, and Streaming. Please reach out if you might be interested in using this API as a customer!
throughput
in README by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/476
drop_last
and add error message when it would produce no batches by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/549
all-cpu
by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/616
repeat
to expand
in GQA by @sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/628
Full Changelog: https://github.com/mosaicml/llm-foundry/compare/v0.2.0...v0.3.0
Published by vchiley over 1 year ago
LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLM). LLM Foundry serves as the efficient training codebase for the MPT-7B and MPT-30B models. Our emphasis is on efficiency, scalability, and ease-of-use, to enable fast iteration and prototyping.
We are excited to share the release of v0.2.0
, packed with support for new hardware, features, and tutorials.
We have released new tutorial content and helper scripts for dataset preparation, pre-training, fine-tuning, and inference!
To start off, a basic walkthrough and answers to FAQs can be found in our Basic Tutorial.
Next, detailed guides for different workflows are linked below:
In addition, for a more advanced and self-contained example of finetuning the MPT-7B model, see Finetune Example.
The inference tutorials cover several new features we've added that improve integration with HuggingFace and FasterTransformer libraries:
LLM Foundry now uses Composer v0.15.0
and Streaming v0.5.1
as minimum requirements. For more details, see their release notes for Composer and Streaming for all the improvements.
β οΈ The new Streaming release includes a few API changes, see the Streaming v0.5 release notes for more details. Our API have also been changed to reflect these API modifications.
π Torch 2.0 support
LLM Foundry is now Torch 2.0 compatible!
Note: we have not tested torch.compile
, but do not expect significant performance improvements.
β‘ H100 Support
We now support NVIDIA H100 systems! See our blog post on Benchmarking LLMs on H100 GPUs for initial performance and convergence details.
To run LLM Foundry with NVIDIA H100 systems, be sure to use a docker images that has CUDA 11.8+ and PyTorch 2.0+ versions.
For example, mosaicml/pytorch:2.0.1_cu118-python3.10-ubuntu20.04
from our dockerhub has been tested with NVIDIA H100 systems.
No code changes should be required.
π AMD MI250 GPU Support
With the release of PyTorch 2.0 and ROCm 5.4+, excited to share that LLM training now works out of the box on AMD Datacenter GPUs! Read our blog post on Training LLMs with AMD MI250 GPUs for more details.
Running with our stack was straightforward: use the ROCm 5.4 docker image rocm/dev-ubuntu-20.04:5.4.3-complete
; and install PyTorch for ROCm 5.4 and install Flash Attention.
Modify your configuration settings:
attn_impl=flash
instead of the default triton
attn_impl=flash
.loss_fn=torch_crossentropy
instead of the default fused_crossentropy
.π§ LoRA finetuning (Preview)
We have included a preview release of Low Rank Adaptation (LoRA) support for memory-efficient fine-tuning of LLMs (Shen et al, 2021).
To use LoRA, follow the instructions found here.
Note: This is a preview feature, please let us know any feedback! The API and support is subject to change.
π Evaluation Refactor (#308)
Our evaluation suite has been significantly refactored into our Model Gauntlet approach. This includes a number of breaking API changes to support multiple models:
model
, use the models
keyword and provide a list of models.tokenizer
is now model-specific.For example, to run the gauntlet of various eval tasks with mosaicml/mpt-7b
:
cd llm-foundry/scripts
composer eval/eval.py eval/yamls/hf_eval.yaml
model_name_or_path=mosaicml/mpt-7b
This release also makes evaluation deterministic even on different number of GPUs.
For more details on all these changes, see #308
β±οΈ Benchmarking Inference
To better support the deployment of LLMs, we have included inference benchmarking suite and results across different hardware and other LLM models.
mosaicml-streaming
version by @hanlint in https://github.com/mosaicml/llm-foundry/pull/110
composer
command by @hanlint in https://github.com/mosaicml/llm-foundry/pull/164
pynvml
by @hanlint in https://github.com/mosaicml/llm-foundry/pull/165
tokenizer_name
config field by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/206
composer[libcloud]
dependency by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/218
mixed_precision: FULL
by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/255
mosaicml/llm-foundry
Docker workflow by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/254
device_map
by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/225
device_map
support for hf_generate.py
and hf_chat.py
by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/276
save_weights_only
as an option by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/301
mosaicml-streaming==0.5.x
by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/292
Full Changelog: https://github.com/mosaicml/llm-foundry/compare/v0.1.1...v0.2.0
Published by mvpatel2000 over 1 year ago
LLM Foundry is now on PyPI!
Full Changelog: https://github.com/mosaicml/llm-foundry/compare/v0.1.0...v0.1.1
Published by dakinggg over 1 year ago
This is the first release of MosaicML's LLM Foundry!
Our efficient code for training, evaluating, and deploying LLMs outgrew our examples repository, so we've migrated to a brand new repository dedicated to everything LLMs. Keep watching this space and see the top-level README and our blog post for more details on this announcement!
In addition to all the open-source code released here, we're releasing four open-source models that we hope will be useful to the community. All models were trained on the MosaicML platform, using Composer and Streaming. If you're interested in training your own models, or using these models with our optimized inference stack, please reach out!
mpt-7b
: This is our base 7-billion parameter model, trained for 1 trillion tokens. This model is released with an Apache-2.0 (commercial use permitted) license.mpt-7b-storywriter
: All of the models use ALiBi to allow them to exrapolate to longer sequence lengths than they saw during training, but storywriter is our long context model, further pretrained on 65k-token excerpts of a fiction subset of the books3 corpus. This model is released with an Apache-2.0 (commercial use permitted) license.mpt-7b-instruct
: This model is instruction finetuned on a dataset we also release, derived from Databrick's Dolly-15k and Anthropicβs Helpful and Harmless datasets. This model is released with a CC-By-SA-3.0 (commercial use permitted) license.mpt-7b-chat
: This model is trained to be able to chat by further training on the ShareGPT-Vicuna, HC3, Alpaca, Helpful and Harmless, and Evol-Instruct datasets. This model is released with a CC-By-NC-SA-4.0 (non-commercial use only) license.We release fully featured code for efficiently training any HuggingFace LLM (including our optimized MPT using FSDP, Composer, and Streaming. Seamlessly scale to multi-gpu and multi-node training, stream your data from one cloud, train on a different cloud, write checkpoints to a third cloud, send your training logs to Weights&Biases, and much more. See the README for more detailed instructions on getting started pretraining and finetuning!
Our MPT model is equipped with the latest advancements in training large transformers (e.g. ALiBi, the LION optimizer, FlashAttention), and is desgined to be easily hackable, configurable, and extendable!
Our evaluation framework, makes it easy to fully re-evaluate any HuggingFace model. We also include copies of the processed data for many popular benchmarks, to make it easy to replicate our evals, and perform your own! We welcome the addition of new benchmarks to our suite. In previous benchmarks, our setup is 8x faster than other eval frameworks on a single GPU and seamlessly achieves linear scaling with multiple GPUs. Built-in support for FSDP makes it possible to evaluate large models and use larger batch sizes for further acceleration.
MPT is designed to be fast, easy, and cheap to deploy for inference. To begin with, all MPT models are subclassed from the HuggingFace PretrainedModel base class, which means that they are fully compatible with the HuggingFace ecosystem. You can upload MPT models to the HuggingFace Hub, generate outputs with standard pipelines like model.generate(...)
, build HuggingFace Spaces (see some of ours here!), and more.
What about performance? With MPTβs optimized layers (including FlashAttention and low precision layernorm), the out-of-the-box performance of MPT-7B on GPUs when using model.generate(...)
is 1.5x-2x faster than other 7B models like LLaMa-7B. This makes it easy to build fast and flexible inference pipelines with just HuggingFace and PyTorch.
Finally, for the best hosting experience, deploy your MPT models directly on MosaicMLβs Inference service. Start with our managed endpoints for models like MPT-7B-Instruct, and/or deploy your own custom model endpoints for optimal cost and data privacy. Check out the Inference blog post for more details!