llm-foundry - v0.10.0 Latest Release

Published by b-chu 4 months ago

🚀 LLM Foundry v0.10.0

New Features

Registry for ICL datasets (https://github.com/mosaicml/llm-foundry/pull/1252)

ICL datasets have now been added as a registry.

Curriculum Learning Callback (https://github.com/mosaicml/llm-foundry/pull/1256)

You can now switch dataloaders while training which enables curriculum learning.

train_loader:
  <dataloader parameters>
callback:
  curriculum_learning:
  - duration: <number>tok
    train_loader:  # matches top level train_loader
      <dataloader parameters>
  - duration: <number>tok
    train_loader:
      <dataloader parameters>
  - duration: <number>tok
    train_loader:
      <dataloader parameters>

[Experimental] Interweave Attention Layers (https://github.com/mosaicml/llm-foundry/pull/1299)

You can now override default block configs for certain layers, allowing for different sliding window sizes, reusing the previous layer's kv cache, etc.

model:
    ...
    (usual model configs)
    ...
    block_overrides:
        order:
        - name: default
        - order:
          - name: sliding_window_layer
          - name: sliding_window_layer_reuse
          - name: sliding_window_layer
          - repeat: 2
            name: sliding_window_layer_reuse
          - name: reuse_kv_layer
          repeat: 2
        overrides:
            sliding_window_layer:
                attn_config:
                    sliding_window_size: 1024
            sliding_window_layer_reuse:
                attn_config:
                    sliding_window_size: 1024
                    reuse_kv_layer_idx: -1 # Relative index of the layer whose kv cache to reuse
            reuse_kv_layer:
                attn_config:
                    reuse_kv_layer_idx: -6 # Relative index of the layer whose kv cache to reuse

Bug fixes

Fix packing + streaming + resumption by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1277

What's Changed

Bump Version to 0.10.0.dev0 by @KuuCi in https://github.com/mosaicml/llm-foundry/pull/1255
Fix typo in setup.py by @XiaohanZhangCMU in https://github.com/mosaicml/llm-foundry/pull/1263
Update TE Dockerfile by @j316chuck in https://github.com/mosaicml/llm-foundry/pull/1265
Revert "Update TE Dockerfile (#1265)" by @j316chuck in https://github.com/mosaicml/llm-foundry/pull/1266
Revert to older TE version by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/1267
Bump Composer to version 0.23.2 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1269
fix linting by @milocress in https://github.com/mosaicml/llm-foundry/pull/1270
Add torch 2.3.1 docker images by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1275
Make expandable segments on by default by @b-chu in https://github.com/mosaicml/llm-foundry/pull/1278
Adds CI for torch 2.3.1 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1281
Update README.md to use variables by @milocress in https://github.com/mosaicml/llm-foundry/pull/1282
Add registry for ICL datasets by @sanjari-orb in https://github.com/mosaicml/llm-foundry/pull/1252
Fix typo in CI by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1284
Fix backwards compatibility for ICL arg by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1286
Fix packing + streaming + resumption by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1277
Dbfs HF by @KuuCi in https://github.com/mosaicml/llm-foundry/pull/1214
Bump mlflow to 2.13.2 by @KuuCi in https://github.com/mosaicml/llm-foundry/pull/1285
Add missing dependency group by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1287
Update Dockerfile with TE main by @j316chuck in https://github.com/mosaicml/llm-foundry/pull/1273
Fix TE HF checkpoint saving by @j316chuck in https://github.com/mosaicml/llm-foundry/pull/1280
added systemMetricsMonitor callback by @JackZ-db in https://github.com/mosaicml/llm-foundry/pull/1260
Extendability refactors by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1290
Small refactor for update batch size by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1293
Bump min composer version to 0.23.3 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1294
Fix grad accum typing by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1296
Bump composer to 0.23.4 by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/1297
Allow passing in lbl_process_group directly by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1298
Add all transforms to train script by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1300
Add Retries to run_query by @KuuCi in https://github.com/mosaicml/llm-foundry/pull/1302
Bumping mlflow version to include buffering by @JackZ-db in https://github.com/mosaicml/llm-foundry/pull/1303
Ignore mosaicml logger for exception if excephook is active by @jjanezhang in https://github.com/mosaicml/llm-foundry/pull/1301
Add curriculum learning callback by @b-chu in https://github.com/mosaicml/llm-foundry/pull/1256
Avoid circular import in hf checkpointer by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1304
Remove codeql workflow by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1305
Update CI test to v0.0.8 by @KuuCi in https://github.com/mosaicml/llm-foundry/pull/1306
Upgrade ci testing to 0.0.8 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1307
Bump ci-testing to 0.0.9 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1310
Fix 4 gpu tests by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1311
Bump recommended images to 2.3.1 and remove 2.3.0 CI by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1312
Provide default seed value in TrainConfig, matching EvalConfig by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/1315
Refactor hf checkpointer for config transformations by @irenedea in https://github.com/mosaicml/llm-foundry/pull/1318
Allows interweaving of arbitrary kinds of 'attention' layers, like sliding window, reuse prev layer kv cache etc. by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1299
Add optional logging of text output to EvalOutputLogging by @sjawhar in https://github.com/mosaicml/llm-foundry/pull/1283

New Contributors

@sanjari-orb made their first contribution in https://github.com/mosaicml/llm-foundry/pull/1252
@JackZ-db made their first contribution in https://github.com/mosaicml/llm-foundry/pull/1260
@sjawhar made their first contribution in https://github.com/mosaicml/llm-foundry/pull/1283

Full Changelog: https://github.com/mosaicml/llm-foundry/compare/v0.9.1...v0.10.0

llm-foundry - v0.9.1

Published by dakinggg 4 months ago

🚀 LLM Foundry v0.9.1

This is a minor patch release to bump the minimum version of mlflow to make sure to buffer writes (https://github.com/mosaicml/composer/pull/3401)

Whats changed

Bumping mlflow version to include buffering by @JackZ-db in https://github.com/mosaicml/llm-foundry/pull/1303

Full Changelog: https://github.com/mosaicml/llm-foundry/compare/v0.9.0...v0.9.1

llm-foundry - v0.9.0

Published by KuuCi 5 months ago

🚀 LLM Foundry v0.9.0

New Features

More Token Encoding Types (https://github.com/mosaicml/llm-foundry/pull/1254)

We've expanded the different ways to encode token IDs by allowing uint32 and uint16 formats, which saves significant space for datasets with smaller vocab sizes. We also extended ndarray type support for MDS dataset columns to the generic text dataset and updated conversion scripts accordingly.

Enforced Stricter Configs (https://github.com/mosaicml/llm-foundry/pull/1254, https://github.com/mosaicml/llm-foundry/pull/1225, https://github.com/mosaicml/llm-foundry/pull/1202)

We've implemented stricter enforcement on our Train and Eval configs to further protect users from attempting to train with invalid configs. In conjunction with numerous other PRs, we have stronger error handling to help users use LLM Foundry smoothly.

Previously, this was allowed:

parameters:
   train_dataloader:
      ...
      seed: ${global_seed}
      random_other_key_that's_not_in_the_dataloader_constructor # this is not allowed
   ...
   global_seed: 17 # this is also not allowed

But we've added a variables section. Please do this instead:

parameters:
  variables:
    global_seed: 42
  ...
  train_dataloader:
    seed: ${variables.global_seed}

Chunked text to mds conversion (https://github.com/mosaicml/llm-foundry/pull/1240)

We've updated our text to mds to convertion script to convert files to MDS in chunks. This protects us from loading entire large files at once (potentially causing OOMs), and drastically speeds up converting long sequences.

Breaking Changes and Deprecations

What's Changed

Bump version v0.9.0.dev0 by @milocress in https://github.com/mosaicml/llm-foundry/pull/1181
structuredconfig for train.py and eval.py by @milocress in https://github.com/mosaicml/llm-foundry/pull/1051
update version names by @milocress in https://github.com/mosaicml/llm-foundry/pull/1185
Refactoring attention by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1182
Checking if attention mask is present for ignoring pad tokens in ffn. by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1188
Bump python 3.11 version in setup.py by @j316chuck in https://github.com/mosaicml/llm-foundry/pull/1189
Docstring fix for curriculum learning callback by @snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1186
Set ft dataloader name explicitly by @milocress in https://github.com/mosaicml/llm-foundry/pull/1187
Remove to_container by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1190
fix eval by @milocress in https://github.com/mosaicml/llm-foundry/pull/1193
Log exception on inactivity callback by @jjanezhang in https://github.com/mosaicml/llm-foundry/pull/1194
Pass FC type along for all FFN types by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1196
Streaming version bump to 0.7.6 by @snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1195
Clearer error message for unknown example type by @milocress in https://github.com/mosaicml/llm-foundry/pull/1202
Added torch_dmoe defaults, bug fixes for 2D inputs by @snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1210
log eval dataset misconfiguration by @milocress in https://github.com/mosaicml/llm-foundry/pull/1179
Using self.shift_labels instead of self.model.transformer.shift_label in the loss function. by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1211
Add fc to HF export by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1209
TransformerEngine Image Build by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/1204
Removed debugging code in tests by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1213
Make fc_type a dict to pass fc kwargs through by @snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1201
Fix dmoe tests GPU OOM by @snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1216
Update readme to clarify flash-attn and TE installs by @snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1219
Modularize components of megablocks layer builder by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1224
Add user error superclass by @milocress in https://github.com/mosaicml/llm-foundry/pull/1225
Make config/class properties on ComposerMPTForCausalLM by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1227
Quick patch to check that Dataset Keys contain non-None Values by @KuuCi in https://github.com/mosaicml/llm-foundry/pull/1228
Modularize backbone class and block creation by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1229
Loss v len callback by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1226
Fixing the state.timestamp.batch.value issue in loss v len callback by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1232
Fix attr error for attention_classes when using act ckpt by @cli99 in https://github.com/mosaicml/llm-foundry/pull/1230
Fix tuple typing by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1235
Add example eval scripts for dbrx PT sizes by @aspfohl in https://github.com/mosaicml/llm-foundry/pull/1218
Configurable submesh by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1236
Add retries to downloads in convert_text_to_mds.py by @irenedea in https://github.com/mosaicml/llm-foundry/pull/1238
Move MLFlow dataset outside of log_config by @KuuCi in https://github.com/mosaicml/llm-foundry/pull/1234
add error when chat template fails by @milocress in https://github.com/mosaicml/llm-foundry/pull/1222
Make the exceptions serializable by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1239
Removing rich install by @jjanezhang in https://github.com/mosaicml/llm-foundry/pull/1198
Chunk file reads and tokenization for text to mds conversion by @irenedea in https://github.com/mosaicml/llm-foundry/pull/1240
Make HF conversion automatically add missing imports by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1241
Add logging to convert_text_to_mds.py script by @irenedea in https://github.com/mosaicml/llm-foundry/pull/1243
Update CODEOWNERS by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1248
Replacing icl_task_type question_answering with generation_task_with_answers in long context eval yamls. by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1250
Change TE docker image to enable te_shard_weight by @j316chuck in https://github.com/mosaicml/llm-foundry/pull/1251
Fix MPT HF conversion by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1257
Remove spurious warning by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1258
Adding more token encoding types by @snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1254
Bump Composer to 0.23.0 by @KuuCi in https://github.com/mosaicml/llm-foundry/pull/1259
Fix typo in setup.py by @XiaohanZhangCMU in https://github.com/mosaicml/llm-foundry/pull/1263
Bump composer to 0.23.2 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1269

Full Changelog: https://github.com/mosaicml/llm-foundry/compare/v0.8.0...v0.9.0

llm-foundry - v0.8.0

Published by milocress 6 months ago

🚀 LLM Foundry v0.8.0

New Features

Megablocks support (#1102)

Support for training optimized MoE models at large scale.

Check out the megablocks documentation for more information on building state of the art MoE models.

Expanded Registries (#1080, #1093, #1094, #1095, #1096, #1165)

We've expanded support for registries to include, dataloaders, FFN layers, attention layers, norms, and parameter initialization functions.

Check out the README for detailed instructions and code examples!

Support for ShareGPT chat format (#1098)

We now support the ShareGPT format for finetuning.

Breaking Changes and Deprecations

We have updated the minimum supported PyTorch version to torch 2.3 (#1152).

In Context Learning Code Evaluation (#1181)

We've removed the code_evaluation task from the allowed in context learning task types, and we've deleted the InContextLearningCodeEvaluationDataset and InContextLearningCodeEvalAccuracy classes.

Question-Answering

We've removed the question_answering task type. Please use the generation_task_with_answers task instead.

What's Changed

Update README by @hanlint in https://github.com/mosaicml/llm-foundry/pull/1069
Expose more exception attributes by @jjanezhang in https://github.com/mosaicml/llm-foundry/pull/1071
Output eval logging batch by @maxisawesome in https://github.com/mosaicml/llm-foundry/pull/961
Add expandeable segments flag by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1075
Check the user provided eos / bos token id against the tokenizer eos / bos token id by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1039
Triton RMSNorm by @josejg in https://github.com/mosaicml/llm-foundry/pull/1050
Fix tiktoken vocab size by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1081
Doing the loss reduction in foundry instead of in the loss functions. by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1079
Decrease log verbosity with no bias by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/1082
Upgrade hf chat by @j316chuck in https://github.com/mosaicml/llm-foundry/pull/1061
Fixes for streaming and auto packing by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1083
Background mlflow model registration by @irenedea in https://github.com/mosaicml/llm-foundry/pull/1078
Update README.md to include DBRX blog under "Latest News" by @lupesko in https://github.com/mosaicml/llm-foundry/pull/1085
Decrease transformers file size for mlflow by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1087
log packing ratio progress by @milocress in https://github.com/mosaicml/llm-foundry/pull/1070
Bump HF version by @b-chu in https://github.com/mosaicml/llm-foundry/pull/1091
Fix typo in expandable_segments by @mammothb in https://github.com/mosaicml/llm-foundry/pull/1088
Bump transformers to 4.39.3 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1086
Fix yaml typo by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1092
Fix for overriding nested configs by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1089
cleaned up HF/MPT conversion test by @milocress in https://github.com/mosaicml/llm-foundry/pull/1048
Update yamls for 0.7.0 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1097
Norms registry by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1080
fixing evaluator microbatch size by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1100
Updating the streaming version in setup.py by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1103
MegaBlocks release by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/1102
Remove torch compile from GLU by @josejg in https://github.com/mosaicml/llm-foundry/pull/1101
Update config_moe_args.py by @vchiley in https://github.com/mosaicml/llm-foundry/pull/1104
Add remote code option to allow execution of DBRX tokenizer by @b-chu in https://github.com/mosaicml/llm-foundry/pull/1106
Fix overwriting FP8 act ckpt flag in the train script by @cli99 in https://github.com/mosaicml/llm-foundry/pull/1107
Support ShareGPT chat format by @samhavens in https://github.com/mosaicml/llm-foundry/pull/1098
FC layer registry by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1093
Attention layer registry by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1094
Dbrx finetune yaml requires save folder specified to enable autoresume by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/1108
Revert "Update config_moe_args.py" by @vchiley in https://github.com/mosaicml/llm-foundry/pull/1111
rm new_group todo by @vchiley in https://github.com/mosaicml/llm-foundry/pull/1112
Migrate ICL classes to foundry by @bmosaicml in https://github.com/mosaicml/llm-foundry/pull/936
FFN layer registry by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1095
Param init registry by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1096
Add missing init file by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1113
Update tests to not rely on mistral by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1117
Bump transformers to 4.40 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1118
add .json to SUPPORTED_EXTENSIONS by @eitanturok in https://github.com/mosaicml/llm-foundry/pull/1114
Add option for subclasses to convert model and tokenizer in hf checkpointer by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1121
Bump Composer to 0.21.3 by @b-chu in https://github.com/mosaicml/llm-foundry/pull/1122
catch misconfigured hf dataset by @milocress in https://github.com/mosaicml/llm-foundry/pull/1123
Pin mlflow by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1124
Change main to a dev version by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1126
Fix deprecation versions by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1129
Clean up the publicly exported API by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1128
Fix HF checkpointer + mlflow bugs by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1125
Update JSONL sources in eval README by @emmanuel-ferdman in https://github.com/mosaicml/llm-foundry/pull/1110
Mlflow datasets by @KuuCi in https://github.com/mosaicml/llm-foundry/pull/1119
Strict key checking for dataset by @b-chu in https://github.com/mosaicml/llm-foundry/pull/1131
First initialize dist with gloo by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1133
Fix saving of generation_config for Llama-3 by @eldarkurtic in https://github.com/mosaicml/llm-foundry/pull/1134
Bump datasets version by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1138
Revert "First initialize dist with gloo (#1133)" by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1139
Barrier immediately after initialize dist with logs by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1140
Add new FT instructions by @b-chu in https://github.com/mosaicml/llm-foundry/pull/1143
Upgrade ci-testing by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/1145
Fix typos in callbacks with configs by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1146
Remove olmo as a dependency by @snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1148
build inner model by @milocress in https://github.com/mosaicml/llm-foundry/pull/1147
fix DatasetConstants.splints default value to protect dictionary overwriting by @ivan-kud in https://github.com/mosaicml/llm-foundry/pull/1144
Bump flash attention version by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1150
Torch 2.3 part 1 - build the images by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1149
Torch 2.3 upgrade Part 2 - CI by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1151
Comment out 2.3 tests by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1155
Fix yaml lint by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1156
Move sentencepiece import by @aspfohl in https://github.com/mosaicml/llm-foundry/pull/1157
Bump composer version to 0.22.0 by @snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1160
Uncomment GPU tests by @milocress in https://github.com/mosaicml/llm-foundry/pull/1162
Depend on coverage by @milocress in https://github.com/mosaicml/llm-foundry/pull/1163
fix dep group in torch 2.3 ci by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1164
Bump min torch version to 2.3.0 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1152
Add line splitting and other linting by @b-chu in https://github.com/mosaicml/llm-foundry/pull/1161
refactoring dataloader into registries. by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1165
Migrate eval output logging to foundry by @maxisawesome in https://github.com/mosaicml/llm-foundry/pull/1166
Fix import and mocking by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1169
minor fix to llmfoundry.data.utils.get_text_collator by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/1170
Fix config access for DBRX by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1177

New Contributors

@lupesko made their first contribution in https://github.com/mosaicml/llm-foundry/pull/1085
@mammothb made their first contribution in https://github.com/mosaicml/llm-foundry/pull/1088
@eitanturok made their first contribution in https://github.com/mosaicml/llm-foundry/pull/1114
@emmanuel-ferdman made their first contribution in https://github.com/mosaicml/llm-foundry/pull/1110
@ivan-kud made their first contribution in https://github.com/mosaicml/llm-foundry/pull/1144

Full Changelog: https://github.com/mosaicml/llm-foundry/compare/v0.7.0...v0.8.0

llm-foundry - v0.7.0

Published by irenedea 7 months ago

🚀 LLM Foundry v0.7.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series.

In addition to the usual bug fixes and performance improvements, we've made foundry more customizable and extensible!

New Features

Registerable Components (#975, #1043, #1052, #1057)

We've made key components of LLM Foundry registrable, such as models, loggers, and callbacks. You can use the registry to easily customize and extend your training workflows.

This means that you can register new options for these components, and then use them in your yaml config.

Check out the README for detailed instructions and code examples!

Breaking Changes and Deprecations

Deprecated Feature Removals (#1063)

We've removed support for deprecated features: triton attention, Prefix LMs, Llama attention patch, z-loss, and text denoising. These features were little used, and we removed them to focus on the core features that are heavily used.

If you were using these features please let us know how you were using them in a GitHub issue. We're happy to add things back that are in heavy usage.

What's Changed

Fix typo in monolithic chkpt callback docs by @sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/1024
Allow code-quality workflow to be callable by @b-chu in https://github.com/mosaicml/llm-foundry/pull/1026
Fix llama attention patch by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1036
Adds a decorator for experimental features by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1038
Finish 0.6.0 release by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1040
Remove reference to attn_impl: triton by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1041
Registry based config - Part 1 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/975
Deprecate attention patching for llama by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1047
Compile GLU by @josejg in https://github.com/mosaicml/llm-foundry/pull/1049
log details to metadata for run analytics by @angel-ruiz7 in https://github.com/mosaicml/llm-foundry/pull/992
Update README.md by @dennyglee in https://github.com/mosaicml/llm-foundry/pull/1056
Add chat schema example for mlflow by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1054
Metrics registry by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1052
LLM Foundry CLI (just registry) by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1043
Bump Composer to 0.21.1 by @jjanezhang in https://github.com/mosaicml/llm-foundry/pull/1053
Dataloaders registry by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1044
Fix multi model eval by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1055
Remove unnecessary test workflow by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1058
Fix peft llama test by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1059
Models registry by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1057
Remove under construction from registry by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1060
Custom Exceptions for Mosaic Logger by @jjanezhang in https://github.com/mosaicml/llm-foundry/pull/1014
Bump version to 0.7.0 by @irenedea in https://github.com/mosaicml/llm-foundry/pull/1063
Fix file filter by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1067
Fix context printing by @irenedea in https://github.com/mosaicml/llm-foundry/pull/1068

New Contributors

@angel-ruiz7 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/992
@dennyglee made their first contribution in https://github.com/mosaicml/llm-foundry/pull/1056

Full Changelog: https://github.com/mosaicml/llm-foundry/compare/v0.6.0...v0.7.0

llm-foundry - v0.6.0

Published by dakinggg 7 months ago

🚀 LLM Foundry v0.6.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series.

In addition to the usual bug fixes and performance improvements, we've added lots of new features!

New Features

Configurable loss for chat-formatted data (#985)

For chat-formatted data, you can now specify which tokens should be loss-generating in a configurable way.

This can be specified in the train_loader.dataset section of your yaml as follows:

...
train_loader:
  dataset:
    ...
    target_prompts: <FILL IN>
    target_reseponses: <FILL IN>

See the docstring for a description of the options.

Olmo support (#1016)

We've added support for the OLMo model from AI2.

To use OLMo, there are a few configuration parameters you need to set. First of all, you will need to install LLM Foundry with the extra package for OLMo (pip install .[gpu,olmo]).

Then you will need to adjust the tokenizer section of your config as follows:

tokenizer:
  name: allenai/OLMo-7B
  kwargs:
    revision: main
    model_max_length: 2048
    model_input_names:
    - input_ids
    - attention_mask
    trust_remote_code: true

Token accuracy (#983)

We've added a new, on-by-default metric to compute token accuracy in addition to cross entropy and perplexity.

Configurable activation checkpointing (#951)

More configurable activation checkpointing for MPT allows finer granular control over memory usage when training MPT. See the docstring for more details.

Finetuning with multiple streams, and pretokenized data (#933, #945, #946)

We've brought the finetuning dataloader up to speed with the pretraining dataloader to support mixing multiple streams, and pretokenizing finetuning data. See the yaml for a full example.

Eval Gauntlet v0.3 (#824)

We've release v0.3 of our Evaluation gauntlet. See the README for a full description.

Breaking changes and deprecations

Flash attention v1 removal (#1023)

Support for flash attention v1 has now been removed.

Extra BOS token removed (#1003)

When tokenizing prompt/response and chat data, for some tokenizers, we were mistakenly adding an extra BOS token between the prompt and the response. This has now been removed.

Deprecation of triton flash attention, prefixLM, and text denoising (#1007)

We've deprecated use of the triton version of flash attention, prefixLM, and text denoising, as these features were not heavily used or actively maintained.

What's Changed

Gauntlet v0.3: Fix chain-of-thought tasks by @bmosaicml in https://github.com/mosaicml/llm-foundry/pull/824
Add finetuning streaming dataset conversion by @bigning in https://github.com/mosaicml/llm-foundry/pull/933
Add default signature to mlflow saved model by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/952
allow te to use meta device with deferred init by @cli99 in https://github.com/mosaicml/llm-foundry/pull/958
Update TUTORIAL.md by @sdonoso in https://github.com/mosaicml/llm-foundry/pull/957
Update mcli yamls to use v0.5.0 by @irenedea in https://github.com/mosaicml/llm-foundry/pull/959
add finutuning with streaming dataset example by @bigning in https://github.com/mosaicml/llm-foundry/pull/945
Add fully configurable activation checkpointing by @cli99 in https://github.com/mosaicml/llm-foundry/pull/951
Use create_model_version instead of register_model by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/953
Add streams support by @bigning in https://github.com/mosaicml/llm-foundry/pull/946
Fix typo by @irenedea in https://github.com/mosaicml/llm-foundry/pull/966
Fix eval.py with lora by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/965
add memory snapshot to callbacks by @cli99 in https://github.com/mosaicml/llm-foundry/pull/810
Adding curriculum learning callback (experimental) by @snarayan21 in https://github.com/mosaicml/llm-foundry/pull/954
strengthened chat formatting validation by @milocress in https://github.com/mosaicml/llm-foundry/pull/960
Add new base images and remove fa1 images by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/970
Add new ICL kwargs in eval.py and long_context yamls by @maxisawesome in https://github.com/mosaicml/llm-foundry/pull/925
Make composer pins consistent with each other by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/972
Make turbo an optional dependency by @snarayan21 in https://github.com/mosaicml/llm-foundry/pull/964
Fix fewshot_random_seed default setting by @maxisawesome in https://github.com/mosaicml/llm-foundry/pull/974
Improve error msg when checking target_blocks overlap by @cli99 in https://github.com/mosaicml/llm-foundry/pull/977
Torch 2.2 upgrade - Part 1 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/976
Torch 2.2 - Part 2 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/979
PyTorch 2.2 - Part 3 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/981
Remove torch 2.1 from docker workflow by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/982
Async callback: Don't skip checkpoints, reliably only launch async eval when the checkpoint is ready by @aspfohl in https://github.com/mosaicml/llm-foundry/pull/813
Token accuracy metrics by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/983
Update readme to not mention 1.13_cu117 by @irenedea in https://github.com/mosaicml/llm-foundry/pull/988
Patch test, lock mcli version by @aspfohl in https://github.com/mosaicml/llm-foundry/pull/990
Bump gha timeouts by @aspfohl in https://github.com/mosaicml/llm-foundry/pull/991
Fix readme typo by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/993
if condition in tie weights added by @megha95 in https://github.com/mosaicml/llm-foundry/pull/989
Bump Composer to 0.20 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/995
Trim examples ahead of time for auto packing by @irenedea in https://github.com/mosaicml/llm-foundry/pull/994
add oom observer callback by @cli99 in https://github.com/mosaicml/llm-foundry/pull/932
Use ci-testing repo for tests by @b-chu in https://github.com/mosaicml/llm-foundry/pull/1000
Make CodeEval respect device_eval_batch_size by @josejg in https://github.com/mosaicml/llm-foundry/pull/956
Remove try except around imports by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1004
Deprecate triton, prefix lm, llama attention patch, and text denoising; Make ComposerHFT5 experimental by @irenedea in https://github.com/mosaicml/llm-foundry/pull/1007
add magic filename for sharded state dicts by @milocress in https://github.com/mosaicml/llm-foundry/pull/1001
Bump CI/CD to v3 by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/1009
Fix evaluators actually pulling eval metrics by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/1006
Build torch 2.2.1 images by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1010
Add torch 2.2.1 tests by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1011
Bump min torch pin to 2.2.1 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1013
Fix extra BOS token in front of response for some tokenizers by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1003
Bump min composer pin by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1015
Add default for eval interval by @irenedea in https://github.com/mosaicml/llm-foundry/pull/987
Add support for olmo by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1016
Add deeper support for multi-turn chats and loss-generating tokens in finetuning by @alextrott16 in https://github.com/mosaicml/llm-foundry/pull/985
Add explicit packing ratio of 1 for profiling by @irenedea in https://github.com/mosaicml/llm-foundry/pull/1019
Bump transformers to 4.38.2 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1018
Making sure MemoryMonitor takes in kwargs. by @snarayan21 in https://github.com/mosaicml/llm-foundry/pull/1020
Update readme for torch version 2.2.1 by @irenedea in https://github.com/mosaicml/llm-foundry/pull/1021
Add code import to train/eval scripts by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1002
Bump version in readme by @bmosaicml in https://github.com/mosaicml/llm-foundry/pull/1022
Bump version to 0.6.0 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/1023

New Contributors

@bigning made their first contribution in https://github.com/mosaicml/llm-foundry/pull/933
@sdonoso made their first contribution in https://github.com/mosaicml/llm-foundry/pull/957
@josejg made their first contribution in https://github.com/mosaicml/llm-foundry/pull/956

Full Changelog: https://github.com/mosaicml/llm-foundry/compare/v0.5.0...v0.6.0

llm-foundry - v0.5.0

Published by irenedea 9 months ago

🚀 LLM Foundry v0.5.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series.

In addition to the usual bug fixes and performance improvements, we've added lots of new features!

New Features

LoRA Support (with FSDP!) (#886)

LLM Foundry now supports LoRA via an integration with the PEFT library. Within LLM Foundry, run train.py, adding peft_config arguments to the model section of the config .yaml, like so:

model:
  ...
  peft_config:
      r: 16
      peft_type: LORA
      task_type: CAUSAL_LM
      lora_alpha: 32
      lora_dropout: 0.05
      target_modules:
      - q_proj
      - k_proj

ALiBi for Flash Attention (#820)

We've added support for using ALiBi with Flash Attention (v2.4.2 or higher).

model:
     ...
     attn_config:
         attn_impl: flash
         alibi: True

Chat Data for Finetuning (#884)

We now support finetuning on chat data, with automatic formatting applied using Hugging Face tokenizer chat templates.

Each sample requires a single key "messages" that maps to an array of message objects. Each message object in the array represents a single message in the conversation and must contain the following keys:

role : A string indicating the author of the message. Possible values are "system" ,"user" , and "assistant" .
content : A string containing the text of the message.

We require that there must be at least one message with the role "assistant", and the last message in the "messages" array must have the role "assistant" .

Here's an example .jsonl with chat data:


{ "messages": [ { "role": "user", "content": "Hi, MPT!" }, { "role": "assistant", "content": "Hi, user!" } ]}
{ "messages": [ 
  { "role": "system": "A conversation between a user and a helpful and honest assistant"}
  { "role": "user", "content": "Hi, MPT!" }, 
  { "role": "assistant", "content": "Hi, user!" },
  { "role": "user", "content": "Is multi-turn chat supported?"},
  { "role": "assistant", "content": "Yes, we can chat for as long as my context length allows." }
]}
...

Safe Load for HuggingFace Datasets (#798)

We now provide a safe_load option when loading HuggingFace datasets for finetuning.

This restricts loaded files to .jsonl, .csv, or .parquet extensions to prevent arbitrary code execution.

To use, set safe_load to true in your dataset configuration:

  train_loader:
    name: finetuning
    dataset:
      safe_load: true
      ...

New PyTorch, Composer, Streaming, and Transformers versions

As always, we've updated to new versions of the core dependencies of LLM Foundry, bringing better performance, new features, and support for new models (mixtral in particular).

Deprecations

Support for Flash Attention v1 (#921)

Will be removed in v0.6.0.

Breaking Changes

Removed support for PyTorch versions before 2.1 (#787)

We no longer support PyTorch versions before 2.1.

Removed Deprecated Features (#948)

We've removed features that have been deprecated for at least one release.

What's Changed

Small test fix to have right padding by @sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/757
Release 040 back to main by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/758
Bump composer version to 0.17.1 by @irenedea in https://github.com/mosaicml/llm-foundry/pull/762
Docker release on workflow_dispatch by @bandish-shah in https://github.com/mosaicml/llm-foundry/pull/763
Fix tiktoken wrapper by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/761
enable param group configuration in llm-foundry by @vchiley in https://github.com/mosaicml/llm-foundry/pull/760
Add script for doing bulk generation against an endpoint by @aspfohl in https://github.com/mosaicml/llm-foundry/pull/765
Only strip object names when creating new output path by @irenedea in https://github.com/mosaicml/llm-foundry/pull/766
Add eval loader to eval script by @aspfohl in https://github.com/mosaicml/llm-foundry/pull/742
Support inputs_embeds by @samhavens in https://github.com/mosaicml/llm-foundry/pull/687
Better error message when test does not complete by @aspfohl in https://github.com/mosaicml/llm-foundry/pull/769
Add codeowners by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/770
add single value support to activation_checkpointing_target by @cli99 in https://github.com/mosaicml/llm-foundry/pull/772
Reorganize tests to make them easier to find by @aspfohl in https://github.com/mosaicml/llm-foundry/pull/768
Add "completion" alias for response key by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/771
Shashank/seq id flash attn by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/738
Fix SIQA gold indices by @bmosaicml in https://github.com/mosaicml/llm-foundry/pull/774
Add missing load_weights_only to example yamls by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/776
Patch flash attn in test to simulate environment without it installed by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/778
Update .gitignore by @aspfohl in https://github.com/mosaicml/llm-foundry/pull/781
Disable mosaicml logger in foundry CI/CD by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/788
Chat fomating template changes by @rajammanabrolu in https://github.com/mosaicml/llm-foundry/pull/784
Remove tests and support for torch <2.1 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/787
Fix utf-8 decode errors in tiktoken wrapper by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/792
Update gauntlet v0.2 to reflect results of calibration by @bmosaicml in https://github.com/mosaicml/llm-foundry/pull/791
Remove from mcli.sdk imports by @aspfohl in https://github.com/mosaicml/llm-foundry/pull/793
Auto packing fixes by @irenedea in https://github.com/mosaicml/llm-foundry/pull/783
Enable flag to not pass PAD tokens in ffwd by @bcui19 in https://github.com/mosaicml/llm-foundry/pull/775
Adding a fix for Cross Entropy Loss for long sequence lengths. by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/795
Minor readme updates and bump min python version by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/799
Enable GLU FFN type by @vchiley in https://github.com/mosaicml/llm-foundry/pull/796
clean up resolve_ffn_hidden_and_exp_ratio by @vchiley in https://github.com/mosaicml/llm-foundry/pull/801
Fix token counting to use attention mask instead of ids by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/802
update openai wrapper to work with tiktoken interface and newest openai version by @bmosaicml in https://github.com/mosaicml/llm-foundry/pull/794
Fix openai not conditioned imports by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/806
Make the ffn activation func configurable by @vchiley in https://github.com/mosaicml/llm-foundry/pull/805
Clean up the logs, bump datasets and transformers by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/804
Fix remote path check for UC volumes by @irenedea in https://github.com/mosaicml/llm-foundry/pull/809
Expand options for MMLU. by @mansheej in https://github.com/mosaicml/llm-foundry/pull/811
Async eval callback by @aspfohl in https://github.com/mosaicml/llm-foundry/pull/702
Updating the Flash Attention version to fix cross entropy loss by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/812
Remove redundant transposes for rope rotation by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/807
Add generic flatten imports to HF checkpointer by @b-chu in https://github.com/mosaicml/llm-foundry/pull/814
Fix token counting to allow there to be no attention mask by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/818
Default to using tokenizer eos and bos in convert_text_to_mds.py by @irenedea in https://github.com/mosaicml/llm-foundry/pull/823
Revert "Default to using tokenizer eos and bos in convert_text_to_mds.py" by @irenedea in https://github.com/mosaicml/llm-foundry/pull/825
Bump turbo version to 0.0.7 by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/827
Align GLU implementation with LLaMa by @vchiley in https://github.com/mosaicml/llm-foundry/pull/829
Use sync_module_states: True when using HSDP by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/830
Update composer to 0.17.2 and streaming to 0.7.2 by @irenedea in https://github.com/mosaicml/llm-foundry/pull/822
zero bias conversion corrected by @megha95 in https://github.com/mosaicml/llm-foundry/pull/624
Bump einops version, which has improved support for torch compile by @sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/832
Update README with links to ML HW resources by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/833
Add safe_load option to restrict HF dataset downloads to allowed file types by @irenedea in https://github.com/mosaicml/llm-foundry/pull/798
Adding support for alibi when using flash attention by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/820
Shashank/new benchmarks by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/838
Fix error when decoding a token in the id gap (or out of range) in a tiktoken tokenizer by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/841
Add use_tokenizer_eos option to convert text to mds script by @irenedea in https://github.com/mosaicml/llm-foundry/pull/843
Disable Environment Variable Resolution by @irenedea in https://github.com/mosaicml/llm-foundry/pull/845
Bump pre-commit version by @b-chu in https://github.com/mosaicml/llm-foundry/pull/847
Fix typo kwargs=>hf_kwargs by @irenedea in https://github.com/mosaicml/llm-foundry/pull/853
Remove foundry time wrangling by @aspfohl in https://github.com/mosaicml/llm-foundry/pull/855
Minor cleanups by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/858
Read UC delta table by @XiaohanZhangCMU in https://github.com/mosaicml/llm-foundry/pull/773
Remove fused layernorm (deprecated in composer) by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/859
Remove hardcoded combined.jsonl with a flag by @XiaohanZhangCMU in https://github.com/mosaicml/llm-foundry/pull/861
Bump to turbo v8 by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/828
Always initialize dist by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/864
Logs upload URI by @milocress in https://github.com/mosaicml/llm-foundry/pull/850
Delta to JSONL conversion script cleanup and bug fix by @nancyhung in https://github.com/mosaicml/llm-foundry/pull/868
Fix MLFlowLogger mock in tests by @jerrychen109 in https://github.com/mosaicml/llm-foundry/pull/872
[XS] Fix delta conversion script regex bug by @nancyhung in https://github.com/mosaicml/llm-foundry/pull/877
Precompute flash attention padding info by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/880
Add GQA to init.py by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/882
fsdp wrap refac by @vchiley in https://github.com/mosaicml/llm-foundry/pull/883
Update model download utils to support ORAS by @jerrychen109 in https://github.com/mosaicml/llm-foundry/pull/881
Update license by @b-chu in https://github.com/mosaicml/llm-foundry/pull/887
Fix tiktoken tokenizer add_generation_prompt by @irenedea in https://github.com/mosaicml/llm-foundry/pull/890
Upgrade datasets version by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/892
Bump transformers version to support Mixtral by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/894
Add tokenizer-only flag to only download tokenizers from HF or oras by @irenedea in https://github.com/mosaicml/llm-foundry/pull/895
Foundational Model API eval wrapper by @aspfohl in https://github.com/mosaicml/llm-foundry/pull/849
Add better error for non-empty local output folder in convert_text_to_mds.py by @irenedea in https://github.com/mosaicml/llm-foundry/pull/891
Allow bool input for loggers by @ngcgarcia in https://github.com/mosaicml/llm-foundry/pull/897
Enable QK Group Norm by @vchiley in https://github.com/mosaicml/llm-foundry/pull/869
Workflow should not have leading ./ by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/905
Add new GC option by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/907
No symlinks at all for HF download by @jerrychen109 in https://github.com/mosaicml/llm-foundry/pull/908
Adds support for chat formatted finetuning input data. by @milocress in https://github.com/mosaicml/llm-foundry/pull/884
Add flag to enable/disable param upload by @ngcgarcia in https://github.com/mosaicml/llm-foundry/pull/912
Add support for eval_loader & eval_subset_num_batches in async callback by @aspfohl in https://github.com/mosaicml/llm-foundry/pull/834
Add the model license file for mlflow by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/915
Warn instead of error on tokenizer-only download with http by @jerrychen109 in https://github.com/mosaicml/llm-foundry/pull/904
Fix fmapi_chat for instruct models and custom tokenizers by @aspfohl in https://github.com/mosaicml/llm-foundry/pull/914
Make yamllint consistent with Composer by @b-chu in https://github.com/mosaicml/llm-foundry/pull/918
Create HF checkpointer model on meta device by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/916
Tiktoken chat format fix by @rajammanabrolu in https://github.com/mosaicml/llm-foundry/pull/893
fix dash issue by @milocress in https://github.com/mosaicml/llm-foundry/pull/919
Fix yaml linting by @b-chu in https://github.com/mosaicml/llm-foundry/pull/920
Adding deprecation warning for Flash Attention 1 and user warning against using Triton attention. by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/921
Add rich formatting to tracebacks by @jjanezhang in https://github.com/mosaicml/llm-foundry/pull/927
Fix docker workflow caching by @irenedea in https://github.com/mosaicml/llm-foundry/pull/930
Remove .ci folder and move FILE_HEADER by @irenedea in https://github.com/mosaicml/llm-foundry/pull/931
Throw error when no EOS by @KuuCi in https://github.com/mosaicml/llm-foundry/pull/922
Bump composer to 0.19 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/934
Update eval_gauntlet_callback.py with math.log2 by @Skylion007 in https://github.com/mosaicml/llm-foundry/pull/821
Switch to the Composer integration of LoRA (works with FSDP) by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/886
Refactoring the add_metrics_to_eval_loaders function to accept list of metric names instead of a dictionary of metrics. by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/938
Fix an extra call to load state dict and type cast in hf checkpointer by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/939
Fixing the gen_attention_mask_in_length function to handle the case when sequence id is -1 due to attention masking by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/940
Update lora docs by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/941
Bump FAv2 setup.py by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/942
Retrieve license information when local files are provided for a pretrained model by @jerrychen109 in https://github.com/mosaicml/llm-foundry/pull/943
Add and use VersionedDeprecationWarning by @irenedea in https://github.com/mosaicml/llm-foundry/pull/944
Bump llm-foundry version to 0.5.0 by @irenedea in https://github.com/mosaicml/llm-foundry/pull/948

New Contributors

@megha95 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/624
@milocress made their first contribution in https://github.com/mosaicml/llm-foundry/pull/850
@nancyhung made their first contribution in https://github.com/mosaicml/llm-foundry/pull/868
@ngcgarcia made their first contribution in https://github.com/mosaicml/llm-foundry/pull/897
@KuuCi made their first contribution in https://github.com/mosaicml/llm-foundry/pull/922
@Skylion007 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/821

Full Changelog: https://github.com/mosaicml/llm-foundry/compare/v0.4.0...v0.5.0

llm-foundry - v0.4.0

Published by dakinggg 11 months ago

🚀 LLM Foundry v0.4.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT-7B and MPT-30B models.

In addition to the usual bug fixes and performance improvements, we've added lots of new features!

New Features

Automatic sequence packing (#683)

You can now specify packing_ratio: auto under your finetuning dataset, to automatically profile and select a good packing ratio to efficiently pack your sequences together on the fly during finetuning. This can dramatically reduce the amount of compute wasted on padding tokens.

Flash Attention 2 (#651, #666, #672)

We now support using Flash Attention 2 both in MPT and in any model that supports Flash Attention 2 via the Transformers library. See the training instructions to learn how to use the different versions of Flash Attention.

New PyTorch, Composer, Streaming, and Transformers versions (#648, #672, #736)

As always, we've updated to new versions of the core dependencies of LLM Foundry, bringing better performance, new features, and support for new models (codellama and mistral in particular).

Easy Databricks model deployment (#618)

We've made it much easier to go from a training run to a served model using Databricks model serving. To make use of this feature, you need to specify both an MLFlowLogger and a HuggingFaceCheckpointer for your run.

The MLFlowLogger should have a Unity Catalog model registry prefix in the form of catalog.schema. This specifies where to register your models to. For example,

loggers:
    mlflow:
        experiment_name: /Users/[email protected]/my_experiment_name,
        tracking_uri: databricks,
        model_registry_prefix: catalog.schema,
        model_registry_uri: databricks-uc,

The HuggingFaceCheckpointer should specify the name you want to register the model under. For example,

callbacks:
    hf_checkpointer:
        save_interval: 1ep # Save Hugging Face formatted checkpoints each epoch
        save_folder: s3://bucket/path/to/my/checkpoints
        mlflow_registered_model_name: my_model_name # Final model will be registered to catalog.schema.my_model_name

MPT model configurations

We've added a few new options when training with the MPT architecture in LLM Foundry.

Rotary embeddings (#675)
(Un)Tied word embeddings (#728)
Fine grained activation checkpointing (#720)

Evaluation Improvements

We've released v0.1 of our Eval Gauntlet (#674, #748)! This adds many new benchmarks, chain-of-thought, and a new safety category. Check out the README for full details!

In addition, we've made a few improvements to our evaluation options, with more to come!

Allow specifying multiple evaluation datasets to compute cross entropy and perplexity on during training (#603)
Easier versions of the HumanEval dataset, which can be useful for comparing smaller models (#645)
More options for averaging the results of the Eval Gauntlet (#640)

New pretraining benchmarks (#543)

Added H100 profiling results to our benchmarking table.

Quality of life improvements

Improved Generate callback with more logging options. Use the Generate callback to log generations from your model over the course of training. (#631)
Count number of tokens during training excluding padding tokens. Previously this count included padding tokens. (#676)
Use the PyTorch profiler to profile your training runs. (#678)
A convenience script for using the much faster Hugging Face snapshot_download to download models from the Hugging Face Hub. (#708)
New AWS specific Docker images with LLM Foundry dependencies pre-installed. (#731)

Experimental features

Inverse square root learning rate scheduler (#657)

We've added experimental support for the inverse square root learning rate scheduler.

Breaking changes

Updated Streaming defaults (#723)

We've upgraded to the latest Streaming version, including vastly improved default settings for partitioning and shuffling. This means that if you were using the defaults, you will get different results after upgrading. The new defaults should be more performant for the large majority of use cases. See the Streaming release notes for more details.

Removed support for PrefixLM for Bloom and OPT models (#704)

We occasionally remove unused experimental parts of the code base to focus on new features and better support for existing features, and we've removed support for PrefixLM applied to Bloom and OPT models in this release.

What's Changed

Multi eval dataset logging by @snarayan21 in https://github.com/mosaicml/llm-foundry/pull/603
Merge release 0.3.0 back to main by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/635
Add tmp path retention policy by @j316chuck in https://github.com/mosaicml/llm-foundry/pull/641
Add flag to disable train metrics by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/642
Update pins to latest version that were missed by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/646
Fix overriding of rope_scaling config by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/644
Add 2.1 images to docker workflow and tests by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/648
Fixes to lion8b test for torch 2.1 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/649
Only log "changing autoresume" when actually changing by @aspfohl in https://github.com/mosaicml/llm-foundry/pull/653
Fix lion8b error correction with torch 2.1 by @dblalock in https://github.com/mosaicml/llm-foundry/pull/656
Clean up processes between distributed gpu tests by @j316chuck in https://github.com/mosaicml/llm-foundry/pull/660
Revert "Clean up processes between distributed gpu tests (#660)" by @j316chuck in https://github.com/mosaicml/llm-foundry/pull/662
Switch ordering of foundry gpu tests by @j316chuck in https://github.com/mosaicml/llm-foundry/pull/665
Change batch size on coding tasks to 1 to avoid OOM by @bmosaicml in https://github.com/mosaicml/llm-foundry/pull/654
Add images with flash attention 2 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/651
Fix yaml change by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/667
Revert actions change by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/668
Inverse Square Root LR Schedule by @mansheej in https://github.com/mosaicml/llm-foundry/pull/657
Add test suite for flash attention 2 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/666
Adding Simplified Coding Tasks by @mcarbin in https://github.com/mosaicml/llm-foundry/pull/645
Fix typo in image name by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/669
Point to composer.callback.Generate by @aspfohl in https://github.com/mosaicml/llm-foundry/pull/631
Do not update past_key_values in place by @irenedea in https://github.com/mosaicml/llm-foundry/pull/652
Fix small typos in the eval readme by @maxisawesome in https://github.com/mosaicml/llm-foundry/pull/671
Convert to DataSpec and add token counts that include padding by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/676
Add support for automatically registering models to UC at the end of training by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/618
add load_strict_model_weights as an optional config parameter by @AllenHW in https://github.com/mosaicml/llm-foundry/pull/655
Small changes to HF repo update script by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/680
Add profiler support in llm foundry by @j316chuck in https://github.com/mosaicml/llm-foundry/pull/678
Update_pretrain_benchmarks by @crinard in https://github.com/mosaicml/llm-foundry/pull/543
add |---| to render tables correctly by @crinard in https://github.com/mosaicml/llm-foundry/pull/686
Adding Mosaic logger + logging data validated event by @jjanezhang in https://github.com/mosaicml/llm-foundry/pull/670
Tiktoken wrapper add_eos_token option by @rajammanabrolu in https://github.com/mosaicml/llm-foundry/pull/681
Attempt to fix flaky test by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/688
Allow flash attention 2 and upgrade to transformers 4.34.1 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/672
Fix mlflow model logging bug by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/692
Add fixtures by @irenedea in https://github.com/mosaicml/llm-foundry/pull/673
Make default for cuda_load_lazy false by @irenedea in https://github.com/mosaicml/llm-foundry/pull/694
Update README.md by @j316chuck in https://github.com/mosaicml/llm-foundry/pull/693
Pad tiktoken vocab so that additional_special_tokens works by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/695
Remove live logs to be consistent with Composer by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/698
Change gauntlet avging by @bmosaicml in https://github.com/mosaicml/llm-foundry/pull/640
Remove prefixlm support for OPT and Bloom by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/704
Fix attention patch compatibility for llama2 by @irenedea in https://github.com/mosaicml/llm-foundry/pull/705
Add test coverage for lion and lion8b checkpoint interop by @dblalock in https://github.com/mosaicml/llm-foundry/pull/679
Improvement in README.md and TUTORIAL.md by @tmsagarofficial in https://github.com/mosaicml/llm-foundry/pull/699
Make TiktokenTokenizerWrapper picklable by @irenedea in https://github.com/mosaicml/llm-foundry/pull/700
Add num_proc to map and filter calls by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/706
Fix HF local module copy contention with a meta init on local rank 0 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/710
Add support for auto packing ratio by @irenedea in https://github.com/mosaicml/llm-foundry/pull/683
Remove HumanEval tasks from ICL eval by @tbarton16 in https://github.com/mosaicml/llm-foundry/pull/715
Allow logging metadata by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/714
Run HF dataset processing on local rank 0 first by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/716
Add Hugging Face model download script by @jerrychen109 in https://github.com/mosaicml/llm-foundry/pull/708
Adding support for Rotary Position Embeddings by @ShashankMosaicML in https://github.com/mosaicml/llm-foundry/pull/675
Add databricks dependency by @irenedea in https://github.com/mosaicml/llm-foundry/pull/717
Set persistent_workers = False for packing profiling by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/718
raise timeout for GPU tests by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/719
change default overwrite to True by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/724
Attempt to fix a very occasional hang in datasets map/filter by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/725
Add Unity Catalog support to HF checkpointer by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/721
Combine filters into one, to avoid datasets error by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/729
Fix logging verbosity in HF model download script and repair symlinks by @jerrychen109 in https://github.com/mosaicml/llm-foundry/pull/727
Gate the dist calls in build_tokenizer by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/732
Create AWS docker image for fine tuning by @j316chuck in https://github.com/mosaicml/llm-foundry/pull/731
Make TiktokenTokenizerWrapper compatible with convert_composer_to_hf.py by @irenedea in https://github.com/mosaicml/llm-foundry/pull/730
Enable tie_word_embeddings config setting to enable / disable weight tied embeddings by @vchiley in https://github.com/mosaicml/llm-foundry/pull/728
add act checkpoint at sub layer level by @cli99 in https://github.com/mosaicml/llm-foundry/pull/720
Better defaults for StreamingDataset subclasses by @snarayan21 in https://github.com/mosaicml/llm-foundry/pull/723
Rename log message by @b-chu in https://github.com/mosaicml/llm-foundry/pull/734
Remove tokenizer_name field by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/735
Fix pairwise attention comparison in test by @sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/737
Fix passed metadata to mlflow logging by @wenfeiy-db in https://github.com/mosaicml/llm-foundry/pull/713
HF script explicitly casts precision by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/741
Bump to composer 0.17 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/736
Patch os cpu count to avoid extra multiprocessing inside pytest which sometimes hangs by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/745
Reenable tests that were accidentally disabled by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/746
Gauntlet v0.1 by @bmosaicml in https://github.com/mosaicml/llm-foundry/pull/674
Remove extra test suite by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/743
Fix typo in workflow file by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/750
Fix 1.13 tests by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/751
Pin Chat format to TiktokenTokenizerWrapper by @rajammanabrolu in https://github.com/mosaicml/llm-foundry/pull/752
Catch exception raised in hf prep properly by @j316chuck in https://github.com/mosaicml/llm-foundry/pull/749
Gauntlet v0.1.0 yaml fixes by @bmosaicml in https://github.com/mosaicml/llm-foundry/pull/748
Fix flash attention GQA bug to use the dynamic size of the key/value tensors - used for eval/inference by @sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/756

New Contributors

@mansheej made their first contribution in https://github.com/mosaicml/llm-foundry/pull/657
@mcarbin made their first contribution in https://github.com/mosaicml/llm-foundry/pull/645
@maxisawesome made their first contribution in https://github.com/mosaicml/llm-foundry/pull/671
@AllenHW made their first contribution in https://github.com/mosaicml/llm-foundry/pull/655
@crinard made their first contribution in https://github.com/mosaicml/llm-foundry/pull/543
@jjanezhang made their first contribution in https://github.com/mosaicml/llm-foundry/pull/670
@rajammanabrolu made their first contribution in https://github.com/mosaicml/llm-foundry/pull/681
@tmsagarofficial made their first contribution in https://github.com/mosaicml/llm-foundry/pull/699
@tbarton16 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/715
@ShashankMosaicML made their first contribution in https://github.com/mosaicml/llm-foundry/pull/675
@cli99 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/720
@b-chu made their first contribution in https://github.com/mosaicml/llm-foundry/pull/734
@wenfeiy-db made their first contribution in https://github.com/mosaicml/llm-foundry/pull/713

Full Changelog: https://github.com/mosaicml/llm-foundry/compare/v0.3.0...v0.4.0

llm-foundry - v0.3.0

Published by dakinggg about 1 year ago

🚀 LLM Foundry v0.3.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT model series. This release includes lots of bug fixes, stability improvements, and improved error messages, in addition to all the new features listed below!

Features

Llama-2 (#485, #520, #533)

Adds support for training Llama-2 models with optimized flash attention. To enable flash attention, set the attention_patch_type in your yaml like so:

model:
    ...
    attention_patch_type: triton
    ...

See the example yaml for a full example of how to finetune Llama-2 on the MosaicML platform.

8-bit Lion (#514)

We have implemented an 8-bit version of the Lion optimizer. This reduces the memory needed per parameter from 12 bits to 9 bits. To switch from Lion to 8-bit Lion, simply change the optimizer name from decoupled_lionw to decoupled_lionw_8b!

Checkpoint conversion (#526, #519, #594)

We've greatly improved our utilities for checkpoint conversion, including generalizing the Composer to Hugging Face conversion script to support all causal LMs, adding a callback to perform the conversion to Hugging Face format during the training job, and support for Faster Transformer conversion from a Composer MPT checkpoint.

To enable the new callback, add the hf_checkpointer callback to your yaml like so:

callbacks:
    ...
    hf_checkpointer:
        # Save a Hugging Face formatted checkpoint at the end of each epoch
        save_interval: 1ep
        # The Hugging Face formatted checkpoints will be saved inside a subfolder called huggingface, 
        # so this folder will likely be the same as your overall save_folder
        save_folder: ./{run_name}/checkpoints 
        # Set the precision you want the checkpoint saved in
        precision: bfloat16

Code evaluation (#587)

We have added support for running HumanEval (code evaluation) using LLM Foundry! See the evaluation readme for a more detailed description and the tasks yaml for an ICL yaml that can be used to run the HumanEval evaluation task.

Transformer Engine support (#432)

Adds support for using NVIDIA's Transformer Enginer to enable FP8 training. To enable, set fc_type='te' and/or ffn_config['ffn_type']='te_ln_mlp' and precision='amp_fp8'.

MLFlow (#475)

Adds support for using MLFlow as an experiment tracker. To enable, simply add mlflow to the loggers section of your yaml. See the Composer docs for more configuration options for MLFlow. Stay tuned for automatic model logging to MLFlow for easy deployment.

Updated streaming version/defaults (#503, #573, #580, #602)

Updates to the latest release of MosaicML Streaming and sets better defaults for improved shuffling quality and training throughput. Check out the Streaming release notes for the full details of all the new options!

Grouped Query Attention (#492)

Implements Grouped Query Attention, which can strike a good balance between the quality of Multi Head Attention and the speed of Multi Query Attention. To enable, set attn_config['attn_type']='grouped_query_attention' and attn_config['kv_n_heads'] to the desired number of kv heads.

MPT quality of life improvements (#559, #599)

Thanks to @tdoublep and @lorabit110 for making MPT a bit easier to use with other parts of the NLP ecosystem!

Eval gauntlet during training, inference API eval wrapper (#501, #494)

Improvements to our evaluation setup, including the ability to run the eval gauntlet during training, and a wrapper to allow using inference APIs with our eval gauntlet. The ICL tasks and gauntlet can be specified as shown [here](https://github.com/mosaicml/llm-foundry/blob/fd36398dad5ac9fde085af679514189ce9439be4/scripts/eval/yamls/hf_eval.yaml#L46-L47.

tiktoken support (#610)

We have enabled training with tiktoken tokenizers with a thin wrapper around the tiktoken library for compatibility with all the tooling built around Hugging Face tokenizers. You can enable this with a simple change to the tokenizer section of your yaml:

tokenizer:
    name: tiktoken
    kwargs:
        model_name: gpt-4

LoRA eval (#515)

Allows the use of our evaluation script with a model trained using LoRA. Coming soon, full support for LoRA with FSDP! See this yaml for an example of evaluating a model trained using LoRA. Stay tuned for full LoRA support with FSDP!

Finetuning API

Lastly, we are building a finetuning API on top of LLM Foundry, Composer, and Streaming. Please reach out if you might be interested in using this API as a customer!

What's Changed

Release/v0.2.0 by @vchiley in https://github.com/mosaicml/llm-foundry/pull/410
Update README.md by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/429
Remove try catch in eval.py; make model_gauntlet optional in eval.py by @bmosaicml in https://github.com/mosaicml/llm-foundry/pull/434
Use torch.repeat instead of expand on key & value in Triton MQA to prevent NaNs with certain h_dims by @sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/442
Update mcli-hf-generate.yaml by @vchiley in https://github.com/mosaicml/llm-foundry/pull/456
Add trust remote code for tokenizer in inference conversion script by @margaretqian in https://github.com/mosaicml/llm-foundry/pull/446
setup.py: replace composer with mosaicml by @guoyejun in https://github.com/mosaicml/llm-foundry/pull/458
Add linear layer and ffn config to enable TransformerEngine layers (with FP8) by @vchiley in https://github.com/mosaicml/llm-foundry/pull/432
use mono checkpoints by @samhavens in https://github.com/mosaicml/llm-foundry/pull/448
Update inference benchmark with recent HF changes by @sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/461
Adding different device intialization to eval by @bcui19 in https://github.com/mosaicml/llm-foundry/pull/466
Fix missing import by @bcui19 in https://github.com/mosaicml/llm-foundry/pull/470
Autoresume default on by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/467
Support eval loader when finetuning from JSONL files in object stores by @samhavens in https://github.com/mosaicml/llm-foundry/pull/469
Fix ambiguous throughput in README by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/476
adds early stopping call back by @codestar12 in https://github.com/mosaicml/llm-foundry/pull/488
Update accelerate to 0.20.3 for LLaMa-2 support by @rishab-partha in https://github.com/mosaicml/llm-foundry/pull/485
If Alibi is on, we should turn learned_pos_emb to False by @bcui19 in https://github.com/mosaicml/llm-foundry/pull/489
Fix Local World Size by @rishab-partha in https://github.com/mosaicml/llm-foundry/pull/495
Increase lint CI timeout by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/498
fix boolean for reentrant setting by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/500
Adding pyright to pre-commit by @bcui19 in https://github.com/mosaicml/llm-foundry/pull/477
fix no bias assignment by @vchiley in https://github.com/mosaicml/llm-foundry/pull/502
Updated StreamingTextDataset to pass take in shuffle_block_size by @snarayan21 in https://github.com/mosaicml/llm-foundry/pull/503
Add MLFlow as a logger option by @aspfohl in https://github.com/mosaicml/llm-foundry/pull/475
Remove old optimizer logs by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/509
Updates GPU test timeout to use mcloud flag by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/510
Grouped Query Attention + Refactor Attn by @sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/492
Fix training integration test by @j316chuck in https://github.com/mosaicml/llm-foundry/pull/517
Update max duration due to mcli api change by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/523
Fix typo in GQA comments by @sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/522
Pr eval lora by @danbider in https://github.com/mosaicml/llm-foundry/pull/515
Monkeypatch flash attention in for llama by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/520
Add runtime error in train.py if yaml config is improperly formatted with extraneous or missing values by @j316chuck in https://github.com/mosaicml/llm-foundry/pull/506
Add runtime error in eval.py if yaml config is improperly formatted with extraneous or missing values by @j316chuck in https://github.com/mosaicml/llm-foundry/pull/521
Add import for pop_config from config_utils by @j316chuck in https://github.com/mosaicml/llm-foundry/pull/531
Add an mcli yaml for running llama2 models by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/533
Adapt composer -> HF conversion script to support all causal lms by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/526
Refactor build optimizer and peft models to use kwargs syntax by @j316chuck in https://github.com/mosaicml/llm-foundry/pull/525
Fix max duration scheduling by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/537
Add missing mixed init to llama example by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/539
Add python log level in llm foundry eval script by @j316chuck in https://github.com/mosaicml/llm-foundry/pull/546
Add Composer MPT to FasterTransformer Conversion Script by @nik-mosaic in https://github.com/mosaicml/llm-foundry/pull/519
Fix init device in conversion script by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/556
Add Programming to Foundry by @rishab-partha in https://github.com/mosaicml/llm-foundry/pull/441
Revert "Add Programming to Foundry" by @rishab-partha in https://github.com/mosaicml/llm-foundry/pull/557
8-bit LION, take 2 by @dblalock in https://github.com/mosaicml/llm-foundry/pull/514
Move the model creation to the last step before trainer creation by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/547
Fix propagation of drop_last and add error message when it would produce no batches by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/549
Enable eval script for HuggingFace 8bit models by @es94129 in https://github.com/mosaicml/llm-foundry/pull/516
change docstring by @bmosaicml in https://github.com/mosaicml/llm-foundry/pull/563
MPT: Change order of operands to enable PT2 compile for inference by @tdoublep in https://github.com/mosaicml/llm-foundry/pull/559
Enable gauntlet training by @bmosaicml in https://github.com/mosaicml/llm-foundry/pull/501
make 8bit flag optional by @bmosaicml in https://github.com/mosaicml/llm-foundry/pull/567
Fix conversion script to work with checkpoints from composer dev by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/568
Adjust print statements in conversion script by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/569
Refactor build_tokenizer to use kwargs syntax and specify name by @j316chuck in https://github.com/mosaicml/llm-foundry/pull/532
Add ValueError in evaluation script if load_path checkpoint is not specified in config for mpt_causal_lm's by @j316chuck in https://github.com/mosaicml/llm-foundry/pull/535
added sampling_method to StreamingTextDataset by @snarayan21 in https://github.com/mosaicml/llm-foundry/pull/573
Pin version of custom kernels more precisely by @dblalock in https://github.com/mosaicml/llm-foundry/pull/577
Add handling for various types of malformed finetuning data by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/576
Add regression tests by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/574
StreamingTextDataset takes correct device batch size by @snarayan21 in https://github.com/mosaicml/llm-foundry/pull/580
Update extension file not found error to be less confusing by @irenedea in https://github.com/mosaicml/llm-foundry/pull/579
typecast shuffle_block_size because of issue by @codestar12 in https://github.com/mosaicml/llm-foundry/pull/581
Upgrade composer version by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/560
Raise torch pin by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/583
Updates the commit in the example llama2 yaml by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/584
Update to transformers 4.32 by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/561
change output format for better copy-pasting into excel by @dskhudia in https://github.com/mosaicml/llm-foundry/pull/459
Fix huggingface custom split path issue by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/588
Add git-repo and git-branch params to regressions script by @irenedea in https://github.com/mosaicml/llm-foundry/pull/591
Fix ComposerHFCausalLM instantiation with PeftModel by @irenedea in https://github.com/mosaicml/llm-foundry/pull/593
Fix some type ignores by @hanlint in https://github.com/mosaicml/llm-foundry/pull/589
Refactor logging by @hanlint in https://github.com/mosaicml/llm-foundry/pull/234
add eval readme by @bmosaicml in https://github.com/mosaicml/llm-foundry/pull/566
Update datasets version to latest by @irenedea in https://github.com/mosaicml/llm-foundry/pull/585
Add lots of return types by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/595
Update transformers version by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/596
Add a callback to write huggingface checkpoints during the training run by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/594
Fix optimizer logging by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/597
Skip flaky lion8b test by @dblalock in https://github.com/mosaicml/llm-foundry/pull/598
Add script for MDS conversion of bucket of text files by @irenedea in https://github.com/mosaicml/llm-foundry/pull/570
Updated streaming args for StreamingDataset subclasses by @snarayan21 in https://github.com/mosaicml/llm-foundry/pull/602
Fixes a typo default arg by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/604
Add inference api eval wrapper by @bmosaicml in https://github.com/mosaicml/llm-foundry/pull/494
Add comment indicating tokenizer API wrapper is experimental by @bmosaicml in https://github.com/mosaicml/llm-foundry/pull/606
Remove regression tests by @irenedea in https://github.com/mosaicml/llm-foundry/pull/607
Add default processes in text to mds conversion by @irenedea in https://github.com/mosaicml/llm-foundry/pull/608
add cloud stores to foundry deps by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/612
Fix eval yamls by @irenedea in https://github.com/mosaicml/llm-foundry/pull/609
Upgrade huggingface-hub dependency by @jerrychen109 in https://github.com/mosaicml/llm-foundry/pull/613
Run CPU tests on a new dep group all-cpu by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/616
Allow MPT models to return attention weights by @lorabit110 in https://github.com/mosaicml/llm-foundry/pull/599
Attempt to speed up codeql by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/617
Support for using tiktoken tokenizers by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/610
Fixes a bad merge in the tiktoken PR by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/619
Update setup.py to bump flash attn by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/615
Replace dashes with underscores in split name by @irenedea in https://github.com/mosaicml/llm-foundry/pull/626
Propagate bias through model by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/627
Change repeat to expand in GQA by @sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/628
Add node rank to signal paths by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/629
Bump composer version by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/630
Add code eval by @samhavens in https://github.com/mosaicml/llm-foundry/pull/587

New Contributors

@margaretqian made their first contribution in https://github.com/mosaicml/llm-foundry/pull/446
@guoyejun made their first contribution in https://github.com/mosaicml/llm-foundry/pull/458
@rishab-partha made their first contribution in https://github.com/mosaicml/llm-foundry/pull/485
@j316chuck made their first contribution in https://github.com/mosaicml/llm-foundry/pull/517
@dblalock made their first contribution in https://github.com/mosaicml/llm-foundry/pull/514
@es94129 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/516
@tdoublep made their first contribution in https://github.com/mosaicml/llm-foundry/pull/559
@irenedea made their first contribution in https://github.com/mosaicml/llm-foundry/pull/579
@jerrychen109 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/613
@lorabit110 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/599

Full Changelog: https://github.com/mosaicml/llm-foundry/compare/v0.2.0...v0.3.0

llm-foundry - v0.2.0

Published by vchiley over 1 year ago

🚀 LLM Foundry v0.2.0

LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLM). LLM Foundry serves as the efficient training codebase for the MPT-7B and MPT-30B models. Our emphasis is on efficiency, scalability, and ease-of-use, to enable fast iteration and prototyping.

We are excited to share the release of v0.2.0, packed with support for new hardware, features, and tutorials.

📖 Tutorials

We have released new tutorial content and helper scripts for dataset preparation, pre-training, fine-tuning, and inference!

To start off, a basic walkthrough and answers to FAQs can be found in our Basic Tutorial.

Next, detailed guides for different workflows are linked below:

Training

In addition, for a more advanced and self-contained example of finetuning the MPT-7B model, see Finetune Example.

Inference

The inference tutorials cover several new features we've added that improve integration with HuggingFace and FasterTransformer libraries:

Major Features

LLM Foundry now uses Composer v0.15.0 and Streaming v0.5.1 as minimum requirements. For more details, see their release notes for Composer and Streaming for all the improvements.

⚠️ The new Streaming release includes a few API changes, see the Streaming v0.5 release notes for more details. Our API have also been changed to reflect these API modifications.

🆕 Torch 2.0 support

LLM Foundry is now Torch 2.0 compatible!

Note: we have not tested torch.compile, but do not expect significant performance improvements.
⚡ H100 Support

We now support NVIDIA H100 systems! See our blog post on Benchmarking LLMs on H100 GPUs for initial performance and convergence details.

To run LLM Foundry with NVIDIA H100 systems, be sure to use a docker images that has CUDA 11.8+ and PyTorch 2.0+ versions.

For example, mosaicml/pytorch:2.0.1_cu118-python3.10-ubuntu20.04 from our dockerhub has been tested with NVIDIA H100 systems.

No code changes should be required.
📈 AMD MI250 GPU Support

With the release of PyTorch 2.0 and ROCm 5.4+, excited to share that LLM training now works out of the box on AMD Datacenter GPUs! Read our blog post on Training LLMs with AMD MI250 GPUs for more details.

Running with our stack was straightforward: use the ROCm 5.4 docker image rocm/dev-ubuntu-20.04:5.4.3-complete; and install PyTorch for ROCm 5.4 and install Flash Attention.

Modify your configuration settings:
- attn_impl=flash instead of the default triton
  - Note: ALiBi is currently not supported with attn_impl=flash.
- loss_fn=torch_crossentropy instead of the default fused_crossentropy.
🚧 LoRA finetuning (Preview)

We have included a preview release of Low Rank Adaptation (LoRA) support for memory-efficient fine-tuning of LLMs (Shen et al, 2021).

To use LoRA, follow the instructions found here.

Note: This is a preview feature, please let us know any feedback! The API and support is subject to change.
🔎 Evaluation Refactor (#308)

Our evaluation suite has been significantly refactored into our Model Gauntlet approach. This includes a number of breaking API changes to support multiple models:
- Instead of model, use the models keyword and provide a list of models.
- tokenizer is now model-specific.
For example, to run the gauntlet of various eval tasks with mosaicml/mpt-7b:
```
cd llm-foundry/scripts
composer eval/eval.py eval/yamls/hf_eval.yaml
    model_name_or_path=mosaicml/mpt-7b
```
This release also makes evaluation deterministic even on different number of GPUs.

For more details on all these changes, see #308
⏱️ Benchmarking Inference

To better support the deployment of LLMs, we have included inference benchmarking suite and results across different hardware and other LLM models.

PR List

hf dict cfg overrides by @vchiley in https://github.com/mosaicml/llm-foundry/pull/90
Add slack and license buttons to readme by @growlix in https://github.com/mosaicml/llm-foundry/pull/98
Add minimum mosaicml-streaming version by @hanlint in https://github.com/mosaicml/llm-foundry/pull/110
Update dataloader.py by @nelsontkq in https://github.com/mosaicml/llm-foundry/pull/102
Add features to hf_generate by @alextrott16 in https://github.com/mosaicml/llm-foundry/pull/116
Make mpt7b finetuning more obvious by @samhavens in https://github.com/mosaicml/llm-foundry/pull/101
Fix(finetune yaml): fix parameters in mpt-7b_dolly_sft.yaml by @alanxmay in https://github.com/mosaicml/llm-foundry/pull/131
Fix HF conversion script to upload to S3 after editing the files to be HF compatible by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/136
Set pad_token_id to tokenizer.pad_token_id if not set on command line by @patrickhwood in https://github.com/mosaicml/llm-foundry/pull/118
Changed the keep_zip default to False to comply with StreamingDataset by @karan6181 in https://github.com/mosaicml/llm-foundry/pull/150
Add cloud upload to checkpoint conversion script by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/151
Adds precision to eval by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/148
Update StreamingDataset defaults by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/157
Explain composer command by @hanlint in https://github.com/mosaicml/llm-foundry/pull/164
Remove pynvml by @hanlint in https://github.com/mosaicml/llm-foundry/pull/165
Adds a concrete finetuning example from a custom dataset by @alextrott16 in https://github.com/mosaicml/llm-foundry/pull/156
Remove health checker by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/167
Rename datasets to avoid hf conflict by @hanlint in https://github.com/mosaicml/llm-foundry/pull/175
Torch2 (#177) by @vchiley in https://github.com/mosaicml/llm-foundry/pull/178
Revert "Torch2 (#177) (#178)" by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/181
clean up dataset conversion readme by @codestar12 in https://github.com/mosaicml/llm-foundry/pull/168
Convert MPT checkpoints to FT format by @dskhudia in https://github.com/mosaicml/llm-foundry/pull/169
Update README.md by @jacobfulano in https://github.com/mosaicml/llm-foundry/pull/198
Removed unused tokenizer_name config field by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/206
Add community links to README by @hanlint in https://github.com/mosaicml/llm-foundry/pull/182
Add Tensorboard logger to yaml config by @hanlint in https://github.com/mosaicml/llm-foundry/pull/166
Update inference README by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/204
torch2 updt with hf fixes by @vchiley in https://github.com/mosaicml/llm-foundry/pull/193
Removing deprecated vocabulary size parameter from composer CE metrics by @sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/222
Add composer[libcloud] dependency by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/218
Use $RUN_NAME rather than $COMPOSER_RUN_NAME by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/209
Fixing benchmark mcli example with proper path and image by @sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/219
Update README.md - Slack Link by @ejyuen in https://github.com/mosaicml/llm-foundry/pull/207
Kv cache speed by @vchiley in https://github.com/mosaicml/llm-foundry/pull/210
Fix a race condition in ICL eval by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/235
Add basic issue templates by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/252
Add a script to run mpt with FasterTransformer by @dskhudia in https://github.com/mosaicml/llm-foundry/pull/229
Change mcli eval YAMLs to use mixed_precision: FULL by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/255
Bump Xentropy Version by @nik-mosaic in https://github.com/mosaicml/llm-foundry/pull/261
updt tritonpremlir to sm90 version by @vchiley in https://github.com/mosaicml/llm-foundry/pull/260
Add mosaicml/llm-foundry Docker workflow by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/254
Patch README for better visibility by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/267
Add support for device_map by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/225
Fix model init when using 1 GPU by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/269
Update README.md by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/268
Update mcp_pytest.py by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/274
Fastertransformer: replace config['mpt'] with config['gpt'] by @dwyatte in https://github.com/mosaicml/llm-foundry/pull/272
Add device_map support for hf_generate.py and hf_chat.py by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/276
Add shift_labels arg to HF wrappers by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/288
Update README.md by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/294
Small formatting fix in eval README by @sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/285
Default to debug level debug by @samhavens in https://github.com/mosaicml/llm-foundry/pull/299
Sam/chat v2 by @samhavens in https://github.com/mosaicml/llm-foundry/pull/296
Add save_weights_only as an option by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/301
Adding Custom Embedding, Enabling us to initialize on Heterogeneous Devices by @bcui19 in https://github.com/mosaicml/llm-foundry/pull/298
Fix convert_dataset_hf.py hanging with excessive num_workers by @casperbh96 in https://github.com/mosaicml/llm-foundry/pull/270
Update README.md by @jacobfulano in https://github.com/mosaicml/llm-foundry/pull/300
Fix autocast dtype by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/302
Set eval shuffle to False by @eldarkurtic in https://github.com/mosaicml/llm-foundry/pull/297
Huggingface Mixed Initialization by @bcui19 in https://github.com/mosaicml/llm-foundry/pull/303
Added new community tutorial on MPT-7B-Instruct Fine Tuning by @VRSEN in https://github.com/mosaicml/llm-foundry/pull/311
Fix generate callback to work with precision context by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/322
Allow MPT past the tied word embeddings error by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/323
Refresh Mosaicml platform yamls by @aspfohl in https://github.com/mosaicml/llm-foundry/pull/208
hard set bias(alibi) precision by @vchiley in https://github.com/mosaicml/llm-foundry/pull/329
Create tasks_light.yaml by @jfrankle in https://github.com/mosaicml/llm-foundry/pull/335
Attn amp by @vchiley in https://github.com/mosaicml/llm-foundry/pull/337
Load on rank 0 only flag by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/334
Add mixed device by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/342
Better error messages for ckpt conversion script by @dskhudia in https://github.com/mosaicml/llm-foundry/pull/320
Add script to update hub code from foundry by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/338
Upgrade to mosaicml-streaming==0.5.x by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/292
updt composer to 0.15.0 by @vchiley in https://github.com/mosaicml/llm-foundry/pull/347
updt yml by @vchiley in https://github.com/mosaicml/llm-foundry/pull/349
Fix bug with saving optimizer states with MonolithicCheckpointSaver Callback by @eracah in https://github.com/mosaicml/llm-foundry/pull/310
Add step to free up some disk space on the worker by @bandish-shah in https://github.com/mosaicml/llm-foundry/pull/350
Filter out sequences where prompt is longer than max length, rather than dropping them on the fly later by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/348
Revert "Filter out sequences where prompt is longer than max length, rather than dropping them on the fly later" by @codestar12 in https://github.com/mosaicml/llm-foundry/pull/354
Remote JSONL IFT data by @samhavens in https://github.com/mosaicml/llm-foundry/pull/275
Add MPT-30B to README by @abhi-mosaic in https://github.com/mosaicml/llm-foundry/pull/356
Codeql on PRs by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/352
Add secrets check as part of pre-commit by @karan6181 in https://github.com/mosaicml/llm-foundry/pull/360
Onboarding tutorial and related improvements by @alextrott16 in https://github.com/mosaicml/llm-foundry/pull/205
fixed rmsnorm bug. Changed division to multiply since using torch.rsqrt by @vancoykendall in https://github.com/mosaicml/llm-foundry/pull/372
Adds max seq len filter before finetuning ds by @vchiley in https://github.com/mosaicml/llm-foundry/pull/359
Feature/peft compatible models by @danbider in https://github.com/mosaicml/llm-foundry/pull/346
Fix Typing (part 1) by @hanlint in https://github.com/mosaicml/llm-foundry/pull/240
improve hf_chat UI and readme by @samhavens in https://github.com/mosaicml/llm-foundry/pull/351
Update onnx by @vchiley in https://github.com/mosaicml/llm-foundry/pull/385
Model gauntlet by @bmosaicml in https://github.com/mosaicml/llm-foundry/pull/308
Add 30b IFT example yaml by @samhavens in https://github.com/mosaicml/llm-foundry/pull/388
Add benchmarks to inference README by @sashaDoubov in https://github.com/mosaicml/llm-foundry/pull/393
updt install instructions by @vchiley in https://github.com/mosaicml/llm-foundry/pull/396
update quickstart eval task by @vchiley in https://github.com/mosaicml/llm-foundry/pull/395
Correct small typo in README.md by @jacobfulano in https://github.com/mosaicml/llm-foundry/pull/391
make peft installs a extra_dep by @vchiley in https://github.com/mosaicml/llm-foundry/pull/397
add fn to clear tests after every test by @vchiley in https://github.com/mosaicml/llm-foundry/pull/400
propagate cache_limit in streaming ds by @vchiley in https://github.com/mosaicml/llm-foundry/pull/402
Fixing hf_generate bug to account for pre-tokenization by @ksreenivasan in https://github.com/mosaicml/llm-foundry/pull/387
Eval Quickstart by @samhavens in https://github.com/mosaicml/llm-foundry/pull/398
Clean up train README by @jacobfulano in https://github.com/mosaicml/llm-foundry/pull/392
Fix/bugbash002 by @danbider in https://github.com/mosaicml/llm-foundry/pull/405
add install for AMD beta support by @vchiley in https://github.com/mosaicml/llm-foundry/pull/407
updt dtype of causal mask by @vchiley in https://github.com/mosaicml/llm-foundry/pull/408
YAMLS for MPT runs inherit global max_seq_len in model config by @alextrott16 in https://github.com/mosaicml/llm-foundry/pull/409
Update mcli-hf-eval.yaml by @samhavens in https://github.com/mosaicml/llm-foundry/pull/411
Edit tutorial comments on PEFT / LoRA by @vchiley in https://github.com/mosaicml/llm-foundry/pull/416
rm peft from pypi package by @vchiley in https://github.com/mosaicml/llm-foundry/pull/420
Update tasks_light.yaml by @jfrankle in https://github.com/mosaicml/llm-foundry/pull/422

New Contributors

@nelsontkq made their first contribution in https://github.com/mosaicml/llm-foundry/pull/102
@samhavens made their first contribution in https://github.com/mosaicml/llm-foundry/pull/101
@alanxmay made their first contribution in https://github.com/mosaicml/llm-foundry/pull/131
@patrickhwood made their first contribution in https://github.com/mosaicml/llm-foundry/pull/118
@karan6181 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/150
@dskhudia made their first contribution in https://github.com/mosaicml/llm-foundry/pull/169
@nik-mosaic made their first contribution in https://github.com/mosaicml/llm-foundry/pull/261
@dwyatte made their first contribution in https://github.com/mosaicml/llm-foundry/pull/272
@casperbh96 made their first contribution in https://github.com/mosaicml/llm-foundry/pull/270
@eldarkurtic made their first contribution in https://github.com/mosaicml/llm-foundry/pull/297
@VRSEN made their first contribution in https://github.com/mosaicml/llm-foundry/pull/311
@aspfohl made their first contribution in https://github.com/mosaicml/llm-foundry/pull/208
@jfrankle made their first contribution in https://github.com/mosaicml/llm-foundry/pull/335
@eracah made their first contribution in https://github.com/mosaicml/llm-foundry/pull/310
@bandish-shah made their first contribution in https://github.com/mosaicml/llm-foundry/pull/350
@vancoykendall made their first contribution in https://github.com/mosaicml/llm-foundry/pull/372
@danbider made their first contribution in https://github.com/mosaicml/llm-foundry/pull/346
@ksreenivasan made their first contribution in https://github.com/mosaicml/llm-foundry/pull/387

Full Changelog: https://github.com/mosaicml/llm-foundry/compare/v0.1.1...v0.2.0

llm-foundry - v0.1.1

Published by mvpatel2000 over 1 year ago

What's New

LLM Foundry is now on PyPI!

What's Changed

Update README.md by @ejyuen in https://github.com/mosaicml/llm-foundry/pull/72
Update version by @dakinggg in https://github.com/mosaicml/llm-foundry/pull/73
Remove todo in workflow by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/74
Bump composer version by @vchiley in https://github.com/mosaicml/llm-foundry/pull/84
Fix pypi by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/80
Remove xentropy from pypi by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/86
Fix sed command for xentropy by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/87
Updates to prefixlm and t5 by @alextrott16 in https://github.com/mosaicml/llm-foundry/pull/85
Disable image for pypi by @mvpatel2000 in https://github.com/mosaicml/llm-foundry/pull/97

New Contributors

@ejyuen made their first contribution in https://github.com/mosaicml/llm-foundry/pull/72

Full Changelog: https://github.com/mosaicml/llm-foundry/compare/v0.1.0...v0.1.1

llm-foundry - Announcing LLM Foundry and the MPT foundation series

Published by dakinggg over 1 year ago

🚀 LLM Foundry v0.1.0

This is the first release of MosaicML's LLM Foundry!

Our efficient code for training, evaluating, and deploying LLMs outgrew our examples repository, so we've migrated to a brand new repository dedicated to everything LLMs. Keep watching this space and see the top-level README and our blog post for more details on this announcement!

Model releases

In addition to all the open-source code released here, we're releasing four open-source models that we hope will be useful to the community. All models were trained on the MosaicML platform, using Composer and Streaming. If you're interested in training your own models, or using these models with our optimized inference stack, please reach out!

mpt-7b: This is our base 7-billion parameter model, trained for 1 trillion tokens. This model is released with an Apache-2.0 (commercial use permitted) license.
mpt-7b-storywriter: All of the models use ALiBi to allow them to exrapolate to longer sequence lengths than they saw during training, but storywriter is our long context model, further pretrained on 65k-token excerpts of a fiction subset of the books3 corpus. This model is released with an Apache-2.0 (commercial use permitted) license.
mpt-7b-instruct: This model is instruction finetuned on a dataset we also release, derived from Databrick's Dolly-15k and Anthropic’s Helpful and Harmless datasets. This model is released with a CC-By-SA-3.0 (commercial use permitted) license.
mpt-7b-chat: This model is trained to be able to chat by further training on the ShareGPT-Vicuna, HC3, Alpaca, Helpful and Harmless, and Evol-Instruct datasets. This model is released with a CC-By-NC-SA-4.0 (non-commercial use only) license.

Features

Training

We release fully featured code for efficiently training any HuggingFace LLM (including our optimized MPT using FSDP, Composer, and Streaming. Seamlessly scale to multi-gpu and multi-node training, stream your data from one cloud, train on a different cloud, write checkpoints to a third cloud, send your training logs to Weights&Biases, and much more. See the README for more detailed instructions on getting started pretraining and finetuning!

Our MPT model is equipped with the latest advancements in training large transformers (e.g. ALiBi, the LION optimizer, FlashAttention), and is desgined to be easily hackable, configurable, and extendable!

Evaluation

Our evaluation framework, makes it easy to fully re-evaluate any HuggingFace model. We also include copies of the processed data for many popular benchmarks, to make it easy to replicate our evals, and perform your own! We welcome the addition of new benchmarks to our suite. In previous benchmarks, our setup is 8x faster than other eval frameworks on a single GPU and seamlessly achieves linear scaling with multiple GPUs. Built-in support for FSDP makes it possible to evaluate large models and use larger batch sizes for further acceleration.

Inference

MPT is designed to be fast, easy, and cheap to deploy for inference. To begin with, all MPT models are subclassed from the HuggingFace PretrainedModel base class, which means that they are fully compatible with the HuggingFace ecosystem. You can upload MPT models to the HuggingFace Hub, generate outputs with standard pipelines like model.generate(...), build HuggingFace Spaces (see some of ours here!), and more.

What about performance? With MPT’s optimized layers (including FlashAttention and low precision layernorm), the out-of-the-box performance of MPT-7B on GPUs when using model.generate(...) is 1.5x-2x faster than other 7B models like LLaMa-7B. This makes it easy to build fast and flexible inference pipelines with just HuggingFace and PyTorch.

Finally, for the best hosting experience, deploy your MPT models directly on MosaicML’s Inference service. Start with our managed endpoints for models like MPT-7B-Instruct, and/or deploy your own custom model endpoints for optimal cost and data privacy. Check out the Inference blog post for more details!

Package Rankings

Top 6.75% on Proxy.golang.org

Top 6.24% on Pypi.org