Supercharge Your Model Training
APACHE-2.0 License
Published by mvpatel2000 11 months ago
1. MosaicML Logger Robustness (https://github.com/mosaicml/composer/pull/2728)
We've improved the MosaicML logger to be more robust to faulty serialization.
raise ... from e
to preserve stack trace by @irenedea in https://github.com/mosaicml/composer/pull/2725
Full Changelog: https://github.com/mosaicml/composer/compare/v0.17.0...v0.17.1
Published by mvpatel2000 11 months ago
1. Hybrid Sharded Data Parallel (HSDP) Integration (#2648)
Composer now supports Hybrid Sharded Data Parallel (HSDP), where a model is both sharded and replicated across blocks of controllable size. By default, this will shard a model within a node and replicate across nodes, but Composer will accept a tuple of process groups to specify custom shard/replicate sizes. This can be specified in the FSDP config.
composer_model = MyComposerModel(n_layers=3)
fsdp_config = {
'sharding_strategy': 'HYBRID_SHARD',
}
trainer = Trainer(
model=composer_model,
max_duration='4ba',
fsdp_config=fsdp_config,
...
)
HYBRID_SHARD
will FULL_SHARD
a model whereas _HYBRID_SHARD_ZERO2
will SHARD_GRAD_OP
within the shard block.
2. Train Loss NaN Monitor (#2704)
Composer has a new callback which will raise a value error if your loss NaNs out. This is very useful to avoid wasting compute if your training run diverges or fails for numerical reasons.
from composer.callbacks import NaNMonitor
composer_model = MyComposerModel(n_layers=3)
trainer = Trainer(
model=composer_model,
max_duration='4ba',
callbacks=NaNMonitor(),
...
)
Full Changelog: https://github.com/mosaicml/composer/compare/v0.16.4...v0.17.0
Published by mvpatel2000 about 1 year ago
1. Torch 2.1 Support
Composer officially supports PyTorch 2.1! We support several new features from 2.1, including CustomPolicy which supports granular wrapping with FSDP.
Full Changelog: https://github.com/mosaicml/composer/compare/v0.16.3...v0.16.4
Published by mvpatel2000 about 1 year ago
1. Add pass@k for HumanEval
HumanEval now supports pass@k. We also support first-class integration with the MosaicML platform for secure code evaluation.
2. log_model
with MLFlow
The MLFlow integration now supports log_model
at the end of the run.
Full Changelog: https://github.com/mosaicml/composer/compare/v0.16.2...v0.16.3
Published by mvpatel2000 about 1 year ago
1. PyTorch Nightly Support
Composer now supports PyTorch Nightly and Cuda 12! Along with new docker images based on nightly PyTorch versions and release candidates, we've updated our PyTorch monkeypatches to support the latest version of PyTorch. These monkeypatches add additional functionality in finer-grain FSDP wrapping and patch bugs related to sharded checkpoints. We are in the process of upstreaming these changes into PyTorch.
1. MosaicML Logger Robustness
MosaicML logger now is robust to platform timeouts and other errors. Additionally, it can now be disabled by setting the environment variable MOSAICML_PLATFORM
to 'False'
when training on the MosaicML platform.
2. GCS Integration
GCS authentication is now supported with HMAC keys, patching a bug in the previous implementation.
3. Optimizer Monitor Norm Calculation (https://github.com/mosaicml/composer/pull/2531)
Previously, the optimizer monitor incorrectly reduced norms across GPUs. It now correctly computes norms in a distributed setting.
Full Changelog: https://github.com/mosaicml/composer/compare/v0.16.1...v0.16.2
Published by mvpatel2000 about 1 year ago
1. HPU (Habana Gaudi) Support (https://github.com/mosaicml/composer/pull/2444)
Composer now supports Habana Gaudi chips! To enable HPUs, device
needs to be specified as 'hpu'
:
composer_model = MyComposerModel(n_layers=3)
trainer = Trainer(
model=composer_model,
device='hpu',
...
)
2. Generate Callback (https://github.com/mosaicml/composer/pull/2449)
We've added a new callback which runs generate on a language model at a given frequency to visualize outputs:
from composer.callbacks import Generate
composer_model = MyComposerModel(n_layers=3)
generate_callback = Generate(prompts=['How good is my model?'], interval='5ba')
trainer = Trainer(
model=composer_model,
callbacks = generate_callback,
...
)
1. Checkpoint Fixes
Elastic sharded checkpointing now disables torchmetric saving to avoid issues with torchmetrics tensors being sharded. Additionally, checkpointing now falls back on the old path which does not convert torchmetrics tensors to numpy
. Checkpointing also no longer materializes optimizer state when saving weights only.
2. MLFlow Performance Improvements
MLFlow integration has significant performance improvements in logging frequency and system metrics collected.
input_ids
to a kwarg in HuggingFaceModel.generate
by @dakinggg in https://github.com/mosaicml/composer/pull/2459
save_weights_only
by @eracah in https://github.com/mosaicml/composer/pull/2450
torch_prof_remote_file_name
as Optional by @srstevenson in https://github.com/mosaicml/composer/pull/2512
Full Changelog: https://github.com/mosaicml/composer/compare/v0.16.0...v0.16.1
Published by mvpatel2000 about 1 year ago
1. New Events (#2264)
Composer now has the events EVAL_BEFORE_ALL
and EVAL_AFTER_ALL
, which lets users control logging of certain bespoke evaluation information across all evalutors.
2. Elastic Sharded Checkpointing
Traditionally, checkpoints are stored as giant monoliths. For large model training, moving the entire model to 1 node may be infeasible and writing one large file from 1 node may be slow. Composer now supports elastic sharded checkpoints with FSDP, where every rank writes a single shard of the checkpoint. This checkpointing strategy is elastic, which means even if you resume on a different number of GPUs, Composer will handle resumption. To enable sharded checkpointing, it must be specified in the FSDP Config as 'state_dict_type': 'sharded'
:
composer_model = MyComposerModel(n_layers=3)
fsdp_config = {
'sharding_strategy': 'FULL_SHARD',
'state_dict_type': 'sharded',
'sharded_ckpt_prefix_dir': 'ba{batch}-shards' # will save each set of shards checkpoint to a unique folder based on batch
}
trainer = Trainer(
model=composer_model,
max_duration='4ba'
fsdp_config=fsdp_config,
save_folder='checkpoints',
save_interval='2ba',
...
)
See the docs for more information in how to integrate this with your project.
EVAL_STANDALONE_START
and EVAL_STANDALONE_END
events and change RUD to not wait_for_workers
every eval by @dakinggg in https://github.com/mosaicml/composer/pull/2418
Full Changelog: https://github.com/mosaicml/composer/compare/v0.15.0...v0.16.0
Published by dakinggg over 1 year ago
This is a patch release that mainly fixes a bug related to autoresume, and changes the default to offload_to_cpu
for PyTorch version >2 sharded checkpoints.
Full Changelog: https://github.com/mosaicml/composer/compare/v0.15.0...v0.15.1
Published by mvpatel2000 over 1 year ago
Exact Eval (https://github.com/mosaicml/composer/pull/2218)
Composer now supports exact evaluation! Now, evaluation will give the exact same results regardless of the number of GPUs by removing any duplicated samples from the dataloader.
Monolithic Checkpoint Loading (https://github.com/mosaicml/composer/pull/2288)
When training large models, loading the model and optimizer on every rank can use up all the system memory. With FSDP, Composer can now load the model and optimizer on only rank 0 and broadcast it to all other ranks. To enable:
from composer import Trainer
# Construct Trainer
trainer = Trainer(
...,
fsdp_config={
load_monolith_rank0_only: True
},
)
# Train!
trainer.fit()
and ensure the model on rank 0 is on CPU/GPU (as opposed to meta).
Spin Dataloaders
By default, Composer spins dataloaders back to the current timestamp to ensure deterministic resumption. However, dataloader spinning can be very slow, so Trainer
now has a new flag to disable spinning if determinism is not required. To enable:
from composer import Trainer
# Construct Trainer
trainer = Trainer(
...,
spin_dataloaders=False,
)
# Train!
trainer.fit()
HealthChecker
is now deprecated and will be removed in v0.17.0
backwards_create_graph
description by @mvpatel2000 in https://github.com/mosaicml/composer/pull/2248
Full Changelog: https://github.com/mosaicml/composer/compare/v0.14.1...v0.15.0
Published by mvpatel2000 over 1 year ago
Fixes a bug related to sentpiece tokenizers and ICL eval.
Full Changelog: https://github.com/mosaicml/composer/compare/v0.14.0...v0.14.1
Published by bandish-shah over 1 year ago
Composer v0.14.0 is released! Install via pip
:
pip install composer==0.14.0
The legacy package name still works via pip
:
pip install mosaicml==0.14.0
🆕 PyTorch 2.0 Support (#2172)
We're thrilled to announce official support for PyTorch 2.0! We've got all initial unit tests passing and run through our examples. We've also made some updates to start taking advantage of all the great new features.
Initial support also includes:
Support for torch.compile
Model | Dataset | Without compile thoughput/samples_per_sec | With compile thoughput/samples_per_sec | Performance % |
---|---|---|---|---|
ResNet50 | ImageNet | 5557 | 7424 | 33.60% |
DeepLab V3 | ADE20K | 81.60 | 98.82 | 21.10% |
HF BERT | C4 | 3360 | 4259 | 26.75% |
HF Causal LM | C4 | 50.61 | 103.29 | 100.05% |
To start using, simply add compile_config
argument to the Trainer
:
# To use default `torch.compile` config
trainer = Trainer(
...,
compile_config={},
)
# To use custom `torch.compile` config, provide an argument as a dictionary, for example:
trainer = Trainer(
...,
compile_config={'mode': 'reduce-overhead'},
)
The Trainer
also supports pre-compiled models passed via the models
argument. If the model has been pre-compiled, the compile_config
argument is ignored if provided.
Note: We recommend baselining your model with and without torch.compile
as there are scenarios where enabling compile does not yield any throughput improvements and in some cases where this can lead to a regression.
PyTorch 2.0 Docker Images
We've added the following new official MosaicML Docker Images with PyTorch 2.0 support:
Linux Distro | Flavor | PyTorch Version | CUDA Version | Python Version | Docker Tags |
---|---|---|---|---|---|
Ubuntu 20.04 | Base | 2.0.0 | 11.7.1 (Infiniband) | 3.10 | mosaicml/pytorch:2.0.0_cu117-python3.10-ubuntu20.04 |
Ubuntu 20.04 | Base | 2.0.0 | 11.7.1 (EFA) | 3.10 | mosaicml/pytorch:2.0.0_cu117-python3.10-ubuntu20.04-aws |
Ubuntu 20.04 | Base | 2.0.0 | cpu | 3.10 | mosaicml/pytorch:2.0.0_cpu-python3.10-ubuntu20.04 |
Ubuntu 20.04 | Vision | 2.0.0 | 11.7.1 (Infiniband) | 3.10 | mosaicml/pytorch_vision:2.0.0_cu117-python3.10-ubuntu20.04 |
Ubuntu 20.04 | Vision | 2.0.0 | cpu | 3.10 | mosaicml/pytorch_vision:2.0.0_cpu-python3.10-ubuntu20.04 |
🦾 New Callbacks
Activation monitor (#2066)
Monitors activations in the network. Every interval batches it will attach a forwards hook and logs the max, average, l2 norm, and kurtosis for the input and output activations. To enable:
from composer import Trainer
from composer.callbacks import ActivationMonitor
# Construct Trainer
trainer = Trainer(
...,
callbacks=[ActivationMonitor()],
)
# Train!
trainer.fit()
Slack Logger (#2133)
You can now send custom training metrics using Slack! To enable:
from composer import Trainer
from composer.loggers import SlackLogger
transform = transforms.Compose([transforms.ToTensor()])
trainer = Trainer(
...
loggers=[
SlackLogger(
log_interval="10ba", # or 1ep, 2ep
include_keys=["algorithm_traces*", "loss*"],
formatter_func=(lambda data, **kwargs:
[
{
"type": "section", "text": {"type": "mrkdwn", "text": f"*{k}:* {v}"}
}
for k, v in data.items()
])
)
],
)
trainer.fit()
Please see PR #2133 for additional details.
grad_accum
argument has been removed from Trainer
, users are now required to use device_train_microbatch_size
instead (#2040)slack_sdk
import by @hanlint in https://github.com/mosaicml/composer/pull/2031
NO_REENTRANT
activation checkpointing by @bmosaicml in https://github.com/mosaicml/composer/pull/2042
LPLayerNorm
and LPGroupNorm
to support self.bias
or self.weight
= None by @abhi-mosaic in https://github.com/mosaicml/composer/pull/2044
device
and dtype
back to LPLayerNorm
by @abhi-mosaic in https://github.com/mosaicml/composer/pull/2067
HuggingFaceModel
by @dakinggg in https://github.com/mosaicml/composer/pull/2045
None
attributes are weight tied by @bcui19 in https://github.com/mosaicml/composer/pull/2103
HuggingFaceModel
by @dakinggg in https://github.com/mosaicml/composer/pull/2093
get_num_tokens_in_batch
by @dakinggg in https://github.com/mosaicml/composer/pull/2139
eval_interval
and save_interval
in tokens by @dakinggg in https://github.com/mosaicml/composer/pull/2149
Full Changelog: https://github.com/mosaicml/composer/compare/v0.13.5...v0.14.0
Published by mvpatel2000 over 1 year ago
Full Changelog: https://github.com/mosaicml/composer/compare/v0.13.4...v0.13.5
Published by mvpatel2000 over 1 year ago
Full Changelog: https://github.com/mosaicml/composer/compare/v0.13.3...v0.13.4
Bumps streaming version pin to <1.0
Published by bandish-shah over 1 year ago
composer
PyPi package!Composer v0.13.3 is released!
Composer can also now be installed using the new composer
PyPi package via pip
:
pip install composer==0.13.3
The legacy package name still works via pip
:
pip install mosaicml==0.13.3
Full Changelog: https://github.com/mosaicml/composer/compare/v0.13.2...v0.13.3
Published by bandish-shah over 1 year ago
composer
PyPi package!Composer v0.13.2 is released!
Composer can also now be installed using the new composer
PyPi package via pip
:
pip install composer==0.13.2
The legacy package name still works via pip
:
pip install mosaicml==0.13.2
device
and dtype
back to LPLayerNorm
(#2067) by @abhi-mosaicLPLayerNorm
and LPGroupNorm
to support self.bias
or self.weight
= None (#2044) by @abhi-mosaicNO_REENTRANT
activation checkpointing (#2042) by @bmosaicmlFull Changelog: https://github.com/mosaicml/composer/compare/v0.13.1...v0.13.2
Published by bandish-shah over 1 year ago
composer
PyPi package!Composer v0.13.1 is released!
Composer can also now be installed using the new composer
PyPi package via pip
:
pip install composer==0.13.1
The legacy package name still works via pip
:
pip install mosaicml==0.13.1
Note: The mosaicml==0.13.0
PyPi package was yanked due to some minor packaging issues discovered after release. The package was re-released as Composer v0.13.1, thus these release notes contain details for both v0.13.0 and v0.13.1.
🤙 New and Updated Callbacks
New HealthChecker
Callback (#2002)
The callback will log a warning if the GPUs on a given node appear to be in poor health (low utilization). The callback can also be configured to send a Slack message!
from composer import Trainer
from composer.callbacks import HealthChecker
# Warn if GPU utilization difference drops below 10%
health_checker = HealthChecker(
threshold = 10
)
# Construct Trainer
trainer = Trainer(
...,
callbacks=health_checker,
)
# Train!
trainer.fit()
Updated MemoryMonitor
to use GigaBytes (GB) units (#1940)
New RuntimeEstimator
Callback (#1991)
Estimate the remaining runtime of your job! Approximates the time remaining by observing the throughput and comparing to the number of batches remaining.
from composer import Trainer
from composer.callbacks import RuntimeEstimator
# Construct trainer with RuntimeEstimator callback
trainer = Trainer(
...,
callbacks=RuntimeEestimator(),
)
# Train!
trainer.fit()
Updated SpeedMonitor
throughput metrics (#1987)
Expands throughput metrics to track relative to several different time units and per device:
throughput/batches_per_sec
and throughput/device/batches_per_sec
throughput/tokens_per_sec
and throughput/device/tokens_per_sec
throughput/flops_per_sec
and throughput/device/flops_per_sec
throughput/device/samples_per_sec
Also adds throughput/device/mfu
metric to compute per device MFU. Simply enable the SpeedMonitor
callback per usual to log these new metrics! Please see SpeedMonitor documentation for more information.
⣿ FSDP Sharded Checkpoints (#1902)
Users can now specify the state_dict_type
in the fsdp_config
dictionary to enable sharded checkpoints. For example:
from composer import Trainer
fsdp_confnig = {
'sharding_strategy': 'FULL_SHARD',
'state_dict_type': 'local',
}
trainer = Trainer(
...,
fsdp_config=fsdp_config,
save_folder='checkpoints',
save_filename='ba{batch}_rank{rank}.pt',
save_interval='10ba',
)
Please see the PyTorch FSDP docs and Composer's Distributed Training notes for more information.
🤗 HuggingFace Improvements
HuggingFaceModel
class to support encoder-decoder batches without decoder_input_ids
(#1950)HuggingFaceModel
directly (#1971)HuggingFaceModel
and write out the expected config.json
and pytorch_model.bin
in the HuggingFace pretrained folder (#1974)🛟 Nvidia H100 Alpha Support - Added amp_fp8
data type
In preparation for H100's arrival, we've added the amp_fp8
precision type. Currently setting amp_fp8
specifies a new precision context using transformer_engine.pytorch.fp8_autocast.
For more details, please see Nvidia's new Transformer Engine and the specific fp8 recipe we utilize.
from composer import Trainer
trainer = Trainer(
...,
precision='amp_fp8',
)
The torchmetrics
package has been upgraded to 0.11.x.
The torchmetrics.Accuracy
metric now requires a task
argument which can take on a value of binary
, multiclass
or multilabel
. Please see Torchmetrics Accuracy docs for details.
Additonally, since specifying value='multiclass'
requires an additional field of num_classes
to be specified, we've had to update ComposerClassifier
to accept the additional num_classes
argument. Please see PR's #2017 and #2025 for additional details
Surgery algorithms used in functional form return a value of None
(#1543)
ProgressBarLogger
and ConsoleLogger
to loggers (#1846)HuggingFaceModel
crashes if config.return_dict = False
(#1948)epoch
metric name to trainer/epoch
(#1986)mosaicml/pytorch:1.12.1*
, mosaicml/pytorch:1.11.0*
, mosaicml/pytorch_vision:1.12.1*
and mosaicml/pytorch_vision:1.11.0*
images are impacted and currently supported for legacy use cases. We recommend users upgrade to images with PyTorch >1.13. The affected images will be removed in the next Composer release.
Full Changelog: https://github.com/mosaicml/composer/compare/v0.12.1...v0.13.1
Published by bandish-shah over 1 year ago
This release has been yanked due to a minor packaging issue, please skip directly to Composer v0.13.1
Full Changelog: https://github.com/mosaicml/composer/compare/v0.12.1...v0.13.0
Published by bandish-shah over 1 year ago
Composer v0.12.1 is released! Install via pip
:
pip install --upgrade mosaicml==0.12.1
📚 In-Context Learning (#1876)
With Composer and MosaicML Cloud you can now evaluate LLMs on in-context learning tasks (LAMBADA, HellaSwag, PIQA, and more) hundreds of times faster than other evaluation harnesses. Please see our "Blazingly Fast LLM Evaluation for In-Context Learning" blog post for more details!
💾 Added support for Coreweave Object Storage (#1915)
Coreweave object store is compatible with boto3
. Uploading objects to Coreweave object store is almost exactly like writing to using S3, except an endpoint_url
must be set via the S3_ENDPOINT_URL
environment variable. For example:
import os
os.environ['S3_ENDPOINT_URL'] = 'https://object.las1.coreweave.com'
from composer.trainer import Trainer
# Save checkpoints every epoch to s3://my_bucket/checkpoints
trainer = Trainer(
model=model,
train_dataloader=train_dataloader,
max_duration='10ep',
save_folder='s3://my_bucket/checkpoints',
save_interval='1ep',
save_overwrite=True,
save_filename='ep{epoch}.pt',
save_num_checkpoints_to_keep=0, # delete all checkpoints locally
)
trainer.fit()
Please see our checkpointing documentation for more details.
🪵 Automatic logging of Trainer hparams (#1855)
Hyperparameter arguments passed to the Trainer
are now automatically logged. Simply set the Trainer
argument auto_log_hparams=True
.
torch==1.13.1
by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1863
meta
tensor initialization if there's no initialization functions, fix associated flaky FSDP test by @bcui19 in https://github.com/mosaicml/composer/pull/1905
Full Changelog: https://github.com/mosaicml/composer/compare/v0.12.0...v0.12.1
Published by bandish-shah almost 2 years ago
Composer v0.12.0 is released! Install via pip
:
pip install mosaicml==0.12.0
🪵 Logging and ObjectStore Enhancements
There are multiple improvements to our logging and object store support in this release.
Image visualization using our CometMLLogger
(#1710)
We've added support for using our ImageVisualizer
callback with CometML to log images and segmentation masks to CometML.
from composer.trainer import Trainer
trainer = Trainer(...,
callbacks=[ImageVisualizer()],
loggers=[CometMLLogger()]
)
Added direct support for Oracle Cloud Infrastructure (OCI) as an ObjectStore
(#1774) and support for Google Cloud Storage (GCS) via URI (#1833)
To use, you can simply set your save_folder
or load_path
to a URI beginning with oci://
or gs://
, to save and load with OCI and GCS respectively.
from composer.trainer import Trainer
# Checkpoint saving to Google Cloud Storage.
trainer = Trainer(
model=model,
save_folder="gs://my-bucket/{run_name}/checkpoints",
run_name='my-run',
save_interval="1ep",
save_filename="ep{epoch}.pt",
save_num_checkpoints_to_keep=0, # delete all checkpoints locally
...
)
trainer.fit()
Added basic support for logging with MLFlow (#1795)
We've added basic support for using MLFlow to log experiment metrics.
from composer.loggers import MLFlowLogger
from composer.trainer import Trainer
mlflow_logger = MLFlowLogger(experiment_name=mlflow_exp_name,
run_name=mlflow_run_name,
tracking_uri=mlflow_uri)
trainer = Trainer(..., loggers=[mlflow_logger])
Simplified console and progress bar logging (#1694)
To turn off the progress bar, set progress_bar=False
. To turn on logging directly to the console, set log_to_console=True
. To control the frequency of logging to console, set console_log_interval
(e.g. to 1ep
or 1ba
).
Our get_file
utility now supports URIs directly (s3://
, oci://
, and gs://
) for downloading files.
🏃♀️ Support for Mid-Epoch Resumption with the latest release of Streaming
We've added support in Composer for the latest release of our Streaming library. This includes awesome new features like instant mid epoch resumption and deterministic shuffling, regardless of the number of nodes. See the Streaming release notes for more!
🚨 New algorithm - GyroDropout
!
Thanks to @jelite for adding a new algorithm, GyroDropout
to Composer! Please see the method card for more details.
🤗 HuggingFace + Composer improvements
We've added a new utility to load a 🤗 HuggingFace model and tokenizer out of a Composer checkpoint (#1754), making the pretraining -> finetuning workflow even easier in Composer. Check out the docs for more details, and our example notebook for a full tutorial (#1775)!
🎓 GradMonitor -> OptimizerMonitor
Renames our GradMonitor
callback to OptimizerMonitor
, and adds the ability to track optimizer specific metrics. Check out the docs for more details, and add to your code just like any other callback!
from composer.callbacks import OptimizerMonitor
from composer.trainer import Trainer
trainer = Trainer(
...,
callbacks=[OptimizerMonitor(log_optimizer_metrics=log_optimizer_metrics)]
)
🐳 New PyTorch and CUDA versions
We've expanded our library of Docker images with support for PyTorch 1.13 + CUDA 11.7:
mosaicml/pytorch:1.13.0_cu117-python3.10-ubuntu20.04
mosaicml/pytorch:1.13.0_cpu-python3.10-ubuntu20.04
The mosaicml/pytorch:latest
, mosaicml/pytorch:cpu_latest
and mosaicml/composer:0.12.0
tags are now built from PyTorch 1.13 based images. Please see our DockerHub repository for additional details.
Replace grad_accum
with device_train_microbatch_size
(#1749, #1776)
We're deprecating the grad_accum
Trainer argument in favor of the more intuitive device_train_microbatch_size
. Instead of thinking about how to divide your specified minibatch into microbatches, simply specify the size of your microbatch. For example, let's say you want to split your minibatch of 2048 into two microbatches of 1024:
from composer import Trainer
trainer = Trainer(
...,
device_train_microbatch_size=1024,
)
If you want Composer to tune the microbatch for you automatically, enable automatic microbatching as follows:
from composer import Trainer
trainer = Trainer(
...,
device_train_microbatch_size='auto',
)
The grad_accum
argument is still supported but will be deprecated in the next Composer release.
Renamed precisions (#1761)
We've renamed precision attributes for clarity. The following values have been removed: ['amp', 'fp16', bf16']
.
We have added the following values, prefixed with 'amp' to clarify when an Automatic Mixed Precision type is being used: ['amp_fp16', 'amp_bf16']
.
The fp32
precision value remains unchanged.
FusedLayerNorm
algorithm (#1789)grad_clip_norm
training argument, please use the GradientClipping
algorithm instead (#1768)data_fit
, data_epoch
, and data_batch
from Logger
(#1826)sync_module_states
, forward_prefecth
, limit_all_gathers
) (#1794)FULL
precision with FSDP (#1796)eval_microbatch
modification on EVAL_BEFORE_FORWARD
event (#1739)None
check preventing setting device_id
to 0
(#1767)metric_names
is not a list (#1798)build_streaming_cifar10_dataloader()
to use v2 by default by @growlix in https://github.com/mosaicml/composer/pull/1730
get_file
by @dakinggg in https://github.com/mosaicml/composer/pull/1750
train_device_microbatch_size
by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1749
fsdp_config
to state
and add fsdp_config to trainer docstring by @growlix in https://github.com/mosaicml/composer/pull/1821
Full Changelog: https://github.com/mosaicml/composer/compare/v0.11.1...v0.12.0
Published by bandish-shah almost 2 years ago
Composer v0.11.1 is released! Install via pip
:
pip install --upgrade mosaicml==0.11.1
NCCL_ASYNC_ERROR_HANDLING
ENV variable in Composer launcher to enable distributed timeout (#1695)eval
is called before fit
(#1697)ValueError
with if evaluation dataloader of infinite length is specifiedFull Changelog: https://github.com/mosaicml/composer/compare/v0.11.0...v0.11.1