Supercharge Your Model Training
APACHE-2.0 License
Published by bandish-shah almost 2 years ago
Composer v0.11.0 is released! Install via pip
:
pip install --upgrade mosaicml==0.11.0
🧰 FSDP Beta Support
Composer now supports PyTorch FSDP! PyTorch FSDP is a strategy for distributed training, similar to PyTorch DDP, that distributes work using data-parallelism only. On top of this, FSDP uses model, gradient, and optimizer sharding to dramatically reduce device memory requirements, and enables users to easily scale and train large models.
Here's how easy it is to use FSDP with Composer:
import torch.nn as nn
from composer import Trainer
class Block (nn.Module):
...
# Your custom model
class Model(nn.Module):
def __init__(self, n_layers):
super().__init__()
self.blocks = nn.ModuleList([
Block(...) for _ in range(n_layers)
]),
self.head = nn.Linear(...)
def forward(self, inputs):
...
# FSDP Wrap Function
def fsdp_wrap_fn(self, module):
return isinstance(module, Block)
# Activation Checkpointing Function
def activation_checkpointing_fn(self, module):
return isinstance(module, Block)
# ComposerModel wrapper, used by the Trainer
# to compute loss, metrics, etc.
class MyComposerModel(ComposerModel):
def __init__(self, n_layers):
super().__init__()
self.model = Model(n_layers)
...
def forward(self, batch):
...
def eval_forward(self, batch, outputs=None):
...
def loss(self, outputs, batch):
...
# Pass your ComposerModel and fsdp_config into the Trainer
composer_model = MyComposerModel(n_layers=3)
fsdp_config = {
'sharding_strategy': 'FULL_SHARD',
'min_params': 1e8,
'cpu_offload': False, # Not supported yet
'mixed_precision': 'DEFAULT',
'backward_prefetch': 'BACKWARD_POST',
'activation_checkpointing': False,
'activation_cpu_offload': False,
'verbose': True
}
trainer = Trainer(
model=composer_model,
fsdp_config=fsdp_config,
...
)
trainer.fit()
For more information, please see our FSDP docs.
🚰 Streaming v0.1
We've spun off Streaming datasets into it's own repository! Streaming datasets is a high-performance drop-in for Torch IterableDataset
, enabling users to stream training data from cloud based object stores. Streaming is shipping with built-in support for popular open source datasets (ADE20K, C4, COCO, Enwiki, ImageNet, etc.)
To get started, install the Streaming PyPi package:
pip install mosaicml-streaming
You can use the streaming Dataset class with the PyTorch native DataLoader class as follows:
import torch
from streaming import Dataset
dataloader = torch.utils.data.DataLoader(dataset=Dataset(remote='s3://...'))
For more information, please check out the Streaming docs.
✔👉 Simplified Checkpointing Interface
With this release we’ve greatly simplified configuration of loading and saving checkpoints in Composer.
To save checkpoints to S3, all you need to do is:
save_folder
your full URI to your save directory destination (e.g. 's3://my-bucket/{run_name}/checkpoints'
)save_filename
to the pattern you want for your checkpoint file namesfrom composer.trainer import Trainer
# Checkpoint saving to S3.
trainer = Trainer(
model=model,
save_folder="s3://my-bucket/{run_name}/checkpoints",
run_name='my-run',
save_interval="1ep",
save_filename="ep{epoch}.pt",
save_num_checkpoints_to_keep=0, # delete all checkpoints locally
...
)
trainer.fit()
Likewise, to load checkpoints from S3, all you have to do is:
load_path
to the full URI to your desired checkpoint file (e.g.'s3://my-bucket/my-run/checkpoints/epoch13.pt'
)from composer.trainer import Trainer
# Checkpoint loading from S3.
new_trainer = Trainer(
model=model,
train_dataloader=train_dataloader,
max_duration="10ep",
load_path="s3://my-bucket/my-run/checkpoints/ep13.pt",
)
new_trainer.fit()
For more information, please see our Checkpointing guide.
𐄳 Improved Distributed Experience
We’ve made it easier to write your own custom distributed entry points by exposing our distributed API. You can now leverage all of our helpful distributed functions and contexts.
For example, let's say we want to need to download a dataset in a distributed training application. To avoid race conditions where different ranks try to write the dataset to the same place, we need to ensure that only rank 0 downloads the dataset first:
import datetime
from composer.trainer.devices import DeviceGPU
from composer.utils import dist
dist.initialize(DeviceGPU(), datetime.timedelta(seconds=30)) # Initialize distributed module
if dist.get_local_rank() == 0: # Download dataset on rank zero
dataset = download_my_dataset()
dist.barrier() # All ranks wait until dataset is downloaded
# Create and train your model!
For more information, please check out our Distributed API docs.
meta
device, initialization should occur on compute device not CPU (#1623)master_port
is auto selected by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1629
intitialize_dist()
by @growlix in https://github.com/mosaicml/composer/pull/1619
Full Changelog: https://github.com/mosaicml/composer/compare/v0.10.1...v0.11.0
Published by bandish-shah about 2 years ago
Composer v0.10.1 is released! Install via pip
:
pip install --upgrade mosaicml==0.10.1
𐄷 Weight Standardization
Weight Standardization reparametrizes convolutional weights such that the fan-in dimensions have zero mean and unit standard deviation. This could slightly improve performance at the expensive of 5% lower throughput. This has been used in several papers to train with smaller batch sizes, with normalization layers besides batch norm, and for transfer learning.
Using Weight Standardization with the Composer Trainer:
import composer
# Apply Weight Standardization (when training is initialized)
weight_std = composer.algorithms.WeightStandardization()
# Train with Weight Standardization
trainer = composer.trainer.Trainer(
...
algorithms=[weight_std]
)
trainer.fit()
Using Weight Standardization with the Composer functional interface:
import composer
from torchvision.models import resnet50
my_model = resnet50()
# Apply weight standardization to model
my_model = composer.functional.weight_standardization(my_model)
Please see the Weight Standardization Method Card for more details.
self.total_union==0
by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1558
initialize_object
to factory methods by @hanlint in https://github.com/mosaicml/composer/pull/1510
Full Changelog: https://github.com/mosaicml/composer/compare/v0.10.0...v0.10.1
Published by bandish-shah about 2 years ago
Composer v0.10.0 is out! This latest release adds support for CometML Experiment tracking, automatic selection of evaluation batch size, API enhancements for Evaluation/Logging/Metrics and a preview of our new streaming datasets repository!
pip install --upgrade mosaicml==0.10.0
☄️ Comet Experiment Tracking (#1490)
We've added support for the popular Comet experiment tracker! To enable, simply create the logger and pass it to the Trainer
object at initialization:
from composer import Trainer
from composer.loggers import CometMLLogger
cometml_logger = CometMLLogger()
trainer = Trainer(
...
loggers=[cometml_logger],
)
Please see our Logging and CometMLLogger docs pages for details on usage.
🪄 Automatic Evaluation Batch Size Selection (#1417)
Composer now supports eval_batch_size='auto'
, which will choose the right evaluation batch size to avoid CUDA OOMs! Now, in conjunction with grad_accum='auto'
, you can run the same code on any hardware with no changes necessary. This makes it easy to add evaluation to a training script without having to pick and choose the right batch sizes to avoid CUDA OOMs.
🎯 Evaluation API Changes (#1479)
The Evaluation API has been updated to be consistent with the Trainer API. If the eval_dataloader
was provided to the Trainer during initialization, eval
can be invoked without needing to provide anything additional:
trainer = Trainer(
eval_dataloader=...
)
trainer.eval()
Alternatively, the eval_dataloader
can be passed directly to the eval()
method:
trainer = Trainer(
...
)
trainer.eval(
eval_dataloader=...
)
The eval_dataloader
can be a pytorch dataloader, or for multiple metrics, a list of Evaluator
objects.
🪵 Simplified Logging (#1416)
We've significantly simplified our internal logging interface:
LogLevel
throughout the logging, which was a mostly unused feature. Filtering logs are the responsibility of the logger.log_metrics
, log_hyperparameters
, and log_artifacts
. Previous calls to data_fit, data_epeoch, ..
have been removed.🎯 validate --> eval_forward (#1411 , #1419)
Previously, ComposerModel
implemented the validate(batch: Any) -> Tuple[Any, Any]
method which returns an (input, target)
tuple, and the Trainer handles updating the metrics. In v0.10
, we return the metrics updating control to the user.
Now, models instead implement def eval_forward(batch: Any)
which returns the outputs of evaluation, and also def update_metric(batch, outputs, metric)
which updates the metric.
An example implementation for classification can be found in our ComposerClassifer
base class:
def update_metric(self, batch: Any, outputs: Any, metric: Metric) -> None:
_, targets = batch
metric.update(outputs, targets)
def eval_forward(self, batch: Any, outputs: Optional[Any] = None) -> Any:
return outputs if outputs is not None else self.forward(batch)
🕵️♀️ Evaluator changes
The Evaluator
class now stores evaluation metric names instead of metric instances. For example:
glue_mrpc_task = Evaluator(
label='glue_mrpc',
dataloader=mrpc_dataloader,
metric_names=['BinaryF1Score', 'Accuracy']
)
These metric names are matched against the metrics returned by the ComposerModel
. The metric instances are now stored as deep copies in the State
class as state.train_metrics
or state.eval_metrics
.
🚧 Streaming Datasets Repository Preview
We're in the process of splitting out streaming datasets into it's own repository! Streaming datasets is a high-performance drop-in replacement for Torch IterableDataset
objects and enables you to stream your training data from cloud based object stores. For an early preview, please checkout the Streaming repo.
❌ YAHP deprecation
We are deprecating support for yahp, our hyperparameter configuration tool. Support for this will be removed in the following minor version release of Composer. We recommend users migrate to OmegaConf, or Hydra as tools.
streaming
requirement by @hanlint in https://github.com/mosaicml/composer/pull/1449
test_precision
and test_state
by @hanlint in https://github.com/mosaicml/composer/pull/1486
save_checkpoint
by @hanlint in https://github.com/mosaicml/composer/pull/1484
initialize_object
to object store class by @hanlint in https://github.com/mosaicml/composer/pull/1508
test_filehelpers.py
by @hanlint in https://github.com/mosaicml/composer/pull/1514
Full Changelog: https://github.com/mosaicml/composer/compare/v0.9.0...v0.10.0
Published by bandish-shah about 2 years ago
Excited to share the release of Composer v0.9.0, which comes with an Inference Export API, beta support for Apple Silicon and TPU training, as well as expanded usability of NLP-related speed-up methods. This release includes 175 commits from 34 contributors, including 10 new contributors 🙌 !
pip install --upgrade mosaicml==0.9.0
Alternatively, install Composer with Conda:
conda install -c mosaicml mosaicml=0.9.0
📦 Export for inference APIs
Train with Composer and deploy anywhere! We have added a dedicated export API as well as an export training callback to allow you to export Composer-trained models for inference, supporting popular formats such as torchscript and ONNX.
For example, here’s how to export a model in torchscript format:
from composer.utils import export_for_inference
# Invoking export with a trained model
export_for_inference(model=model,
save_format='torchscript',
save_path=model_save_path)
Here’s an example of using the training callback, which automatically exports the model at the end of training to ONNX format:
from composer.callbacks import ExportForInferenceCallback
# Initializing Trainer with the export callback
callback = ExportForInferenceCallback(save_format='onnx',
save_path=model_save_path)
trainer = Trainer(model=model,
callbacks=callback,
train_dataloader=dataloader,
max_duration='10ep')
# Model will be exported at the end of training
trainer.fit()
Please see our Exporting for Inference notebook for more information.
📈 ALiBi support for BERT training
You can now use ALiBi (Attention with Linear Biases; Press et al., 2021) when training BERT models with Composer, delivering faster training and higher accuracy by leveraging shorter sequence lengths.
ALiBi improves the quality of BERT pre-training, especially when pre-training uses shorter sequence lengths than the downstream (fine-tuning) task. This allows models with ALiBi to reach higher downstream accuracy with less pre-training time.
Example of using ALiBi as an algorithm with the Composer Trainer:
# Create an instance of a BERT masked language model
model = composer.models.create_bert_mlm()
# Apply ALiBi (when training is initialized)
alibi = composer.algorithms.alibi(max_sequence_length=1024)
# Train with ALiBi
trainer = composer.trainer.Trainer(
model=model,
train_dataloader=train_dataloader,
algorithms=[alibi]
)
trainer.fit()
Example using the Composer Functional API:
import composer.functional as cf
# Create an instance of a BERT masked language model
model = composer.models.create_bert_mlm()
# Apply ALiBi and expand the model's maximum sequence length to 1024
cf.apply_alibi(model=model, max_sequence_length=1024)
AliBi can also now be extended to work with custom models by registering your attention and embedding layers. Please see our ALiBi method card for more information.
🧐 Entry point for GLUE tasks pre-training and fine-tuning
You can now easily pre-train and fine-tune NLP models across all GLUE (General Language Understanding Evaluation) tasks through one simple entry point! The entry point handles model saving and loading, spawns GLUE tasks in parallel across all available GPUs, and delivers a highly efficient evaluation of model performance.
Example of launching the entrypoint:
# This runs pre-training followed by fine-tuning.
# --training_scheme can take either pretrain, finetune, or all depending on the task!
python run_glue_trainer.py -f glue_example.yaml --training_scheme all
Please see our GLUE entrypoint notebook for more information.
🤖 TPU support (in beta)
You can now use Composer to train your models on TPUs! Support is now available in Beta, and currently only supports single-core TPU training. Try it out, explore optimizations, and share your feedback and feature requests with us so we can make it better for you and for the community.
To use TPUs with Composer, simply specify a tpu
device:
# Set device to `tpu`
trainer = composer.trainer.Trainer(
model=model,
train_dataloader=train_dataloader,
max_duration=train_epochs,
device='tpu')
# Run fit
trainer.fit()
Please see our Training with TPUs notebook for more information.
🍎 Apple Silicon support (beta)
Leverage Apple Silicon chips to train your models with Composer by providing the device='mps'
argument:
trainer = Trainer(
...,
device='mps'
)
We use the latest PyTorch MPS backend to execute the training. This requires torch version ≥1.12, and Max OSX 12.3+.
For more information on training with Apple M chips, see the PyTorch 1.12 blog and our API Reference for Composer specific details.
🚧 Contrib repository
Got a new method idea, or published a paper and want those methods to be easily accessible? We’ve created the mcontrib
repository, with a lightweight process to contribute new algorithms. We’re happy to work directly with you to benchmark these methods and eventually “promote” them to Composer for use by end customers.
Please checkout the README for details on how to contribute a new algorithm. For more details on how to write speed-up methods, see our notebook on custom speed-up methods.
🔢 Passes Module
The order in which algorithms are run matters significantly during composition. With this release we refactored algorithm passes into their own passes
module. Users can now register custom passes (for custom algorithms) with the Engine. Please see #1377 for more information.
🗄️ Default Checkpoint Extension
The CheckpointSaver now defaults to using the *.pt
extension for checkpoint fienames. Please see #1370 for more information.
👁️ Models Refactor
Most vision models (ResNet, MNIST, ViT, EfficientNet) have been refactored from classes to a factory function. For example ComposerResNet
-> composer_resnet
.
# before
from composer.models import ComposerResNet
model = ComposerResNet(..)
from composer.models import composer_resnet # after
model = composer_resnet(..)
The same refactor has been done for NLP as well, e.g. BERTModel
-> create_bert_mlm
and create_bert_classification
.
See #1227 (vision) and #1130 (NLP) for more details.
➕ Misc API Changes
BreakEpochException
has been removed.state.is_model_deepspeed
has been moved to composer.utils.is_model_deepspeed
.monitored_barrier
has been added to composer
distributed.nproc = torch.cuda.device_count()
if not specified via env by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1195
mosaicml/composer_staging
by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1197
torchvision
datasets by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1201
composer_train
entrypoint; put it back in examples
by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1211
get_file
by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1216
StreamingDataset
s to subclass VisionDataset
by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1223
COMPOSER_KNOWN_HOSTS_FILENAME
for setting the sftp known hosts file environ by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1224
FileNotFoundError
by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1233
StreamingC4
to 120s by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1234
Event.INIT
by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1084
StreamingDataset
compression file easier to write/read by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1246
boto3.Session()
per S3ObjectStore
instance by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1260
v0.8
, add testing by @hanlint in https://github.com/mosaicml/composer/pull/1257
meta.yaml
; add py-cpuinfo
max version by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1271
scale_warmup
argument to schedulers by @hanlint in https://github.com/mosaicml/composer/pull/1268
allow_tf32=True
for GPU Devices by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1275
return_outputs
flag to predict()
by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1307
get_file_artifact
in the WandBLogger to work on all ranks by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1304
run_name
to Composer by @eracah in https://github.com/mosaicml/composer/pull/1298
dev
branch merges by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1308
pytest-timeout
by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1317
predict()
by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1352
mps
and tpu
device to Trainer docstrings by @hanlint in https://github.com/mosaicml/composer/pull/1410
Full Changelog: https://github.com/mosaicml/composer/compare/v0.8.2...v0.9.0
Published by bandish-shah about 2 years ago
Composer v0.8.2 is released! Install via pip
:
pip install --upgrade mosaicml==0.8.2
Alternatively, install Composer with Conda:
conda install -c mosaicml mosaicml=0.8.2
Fixed Notebook Progress Bars in Colab
Fixes a bug introduced by #1264 which causes Composer running in Colab notebooks to error out with:
UnsupportedOperation: fileno.
Closes #1312. Fixed in PR #1314.
https://github.com/mosaicml/composer/compare/v0.8.1...v0.8.2
Published by bandish-shah over 2 years ago
Composer v0.8.1 is released! Install via pip
:
pip install --upgrade mosaicml==0.8.1
Alternatively, install Composer with Conda:
conda install -c mosaicml mosaicml=0.8.1
🖼️ Image Visualizer
The ImageVisualizer
callback periodically logs the training and validation images when using the WandB logger. This is great for validating your dataloader pipeline, especially if extensive data augmentations are used. Also, when training on a semantic segmentation task, the callback can log the target segmentation mask and the predicted segmentation mask by setting the argument mode='segmentation'
. See PR #1266 for more details. Here is an example of using the ImageVisualizer
callback:
from composer import Trainer
from composer.callbacks import ImageVisualizer
# Callback to log 8 training images after every 100 batches
image_visualizer = ImageVisualizer()
# Construct trainer
trainer = Trainer(
...,
callbacks=image_visualizer
)
# Train!
trainer.fit()
Here is an example visualization from the training set of ADE20k:
📶 TensorBoard Logging
You can now log metrics and losses from your Composer training runs with Tensorboard! See #1250 and #1283 for more details. All you have to do is create a TensorboardLogger
object and add it
to the list of loggers in your Trainer
object like so:
from composer import Trainer
from composer.loggers import TensorboardLogger
tb_logger = TensorboardLogger(log_dir="./my_tensorboard_logs")
trainer = Trainer(
...
# Add your Tensorboard Logger to the trainer here.
loggers=[tb_logger],
)
trainer.fit()
For more information, see this tutorial.
🔙 Multiple Losses
Adds support for multiple losses. If a model returns a tuple of losses, they are summed before the loss.backward()
call. See #1240 for more details.
🌎️ Stream Datasets from HTTP URIs
You can now specify a HTTP URI for a Streaming Dataset remote. See #1258 for more detials. For example:
from composer.datasets.streaming import StreamingDataset
from torch.utils.data import DataLoader
# Construct the Dataset
dataset = StreamingDataset(
...,
remote="https://example.com/dataset/",
)
# Construct the DataLoader
train_dl = DataLoader(dataset)
# Construct the Trainer
trainer = Trainer(
...,
train_dataloader=train_dl,
)
# Train!
trainer.fit()
For more information on streaming datasets, see this tutorial.
🏄️ GPU Devices default to TF32 Matmuls
Beginning with PyTorch 1.12, the default behavior for computing FP32 matrix multiplies on NVIDIA Ampere devices was switched from TF32 to FP32. See PyTorch documentation here.
Since Composer is designed specifically for ML training with a focus on efficiency, we choose to preserve the old default of using TF32 on Ampere devices. This leads to significantly higher throughput when training in single precision, without impact training convergence. See PR #1275 for implementation details.
👋 Set the Device ID for GPU Devices
Specify the device ID within a DeviceGPU to train on when instantiating a Trainer object instead of using the local ID! For example,
from composer.trainer.devices.device_gpu import DeviceGPU
# Specify to use GPU 3 to train
device = DeviceGPU(device_id=3)
# Construct the Trainer
trainer = Trainer(
...,
device = device
)
# Train!
trainer.fit()
BERT and C4 Updates
We make some minor adjustments to our bert-base-uncased.yaml
training config. In particular, we make the global train and eval batch sizes a power of 2. This maintains divisibility when using many GPUs in multi-node training. We also adjust the max_duration
so that it converts cleanly to 70,000 batches.
We also upgrade our StreamingDataset C4 conversion script (scripts/mds/c4.py
) to use a multi-threaded reader. On a 64-core machine we are able to convert the 770GB train split to .mds
format in ~1.5hr.
📂 Set a prefix
when using a S3ObjectStore
When using S3ObjectStore
for applications like checkpointing, it can be useful to provide path prefixes, mimicking folder/subfolder
directories like on a local filesystem. When prefix
is provided, any objects uploaded with S3ObjectStore
will be stored at f's3://{self.bucket}/{self.prefix}{object_name}'
.
⚖️ Scale the Warmup Period of Composer Schedulers
Added a new flag scale_warmup
to schedulers that will scale the warmup period when a scale schedule ratio is applied. Default is False
to mirror default behavior. See #1268 for more detials.
🧊 Stochastic Depth on Residual Blocks
Residual blocks are detected automatically and replaced with stochastic versions. See #1253 for more details.
Fixed Progress Bars
Fixed a bug where the the Progress Bars jumped around and did not stream properly when tailing the terminal over the network. Fixed in #1264, #1287, and #1289.
Fixed S3ObjectStore in Multithreaded Environments
Fixed a bug where the boto3
crashed when creating the default session in multiple threads simultaniously (see https://github.com/boto/boto3/issues/1592). Fixed in #1260.
Retry on ChannelException
errors in the SFTPObjectStore
Catch ChannelException
SFTP transient error and retry. Fixed in #1245.
Treating S3 Permission Denied Errors as Not Found Errors
We update our handling of botocore
403 ClientErrors to interpret them as FileNotFoundErrors
. We do this because of a situation that occurs when a user has no S3 credentials configured, and tries to read from a bucket with public files. For privacy, Amazon S3 raises 403 (Permission Denied) instead of 404 (Not Found) errors. As such, PR #1249 treats 403 ClientErrors as FileNotFoundErrors.
Fixed Parsing of grad_accum
in the TrainerHparams
Fixes an error where the command line override --grad_accum
lead to incorrect parsing. Fixed in #1256.
Fixed Example YAML Files
Our recipe configurations (YAML) are updated to the latest version, and a test was added to enforce correctness moving forward. Fixed in #1235 and #1257.
https://github.com/mosaicml/composer/compare/v0.8.0...v0.8.1
Published by ravi-mosaicml over 2 years ago
Composer v0.8.0 is released! Install via pip
:
pip install --upgrade mosaicml==0.8.0
Alternatively, install Composer with Conda:
conda install -c mosaicml mosaicml=0.8.0
🤗 HuggingFace ComposerModel
Train your HuggingFace models with Composer! We introduced a HuggingFaceModel
that converts your existing 🤗 Transformers models into a ComposerModel.
For example:
import transformers
from composer.models import HuggingFaceModel
# Define the model
hf_model = transformers.AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
# Convert it into a ComposerModel
model = HuggingFaceModel(hf_model)
# Construct the trainer
trainer = Trainer(
...,
model,
)
# Train!
trainer.fit()
For more information, see the example on fine-tuning a pretrained BERT with Composer.
🫕 Fused Layer Norm
Fused LayerNorm replaces implementations of torch.nn.LayerNorm
with a apex.normalization.fused_layer_norm
. The fused kernel provides increased GPU utilization.
For example:
from composer.trainer import Trainer
from composer.algorithms import FusedLayerNorm
# Initialize the algorithm
alg = FusedLayerNorm()
# Construct the trainer
trainer = Trainer(
algorithms=alg,
)
# Train!
trainer.fit()
See the method card for more information.
💾 Ignore Checkpoint Parameters
If you have a checkpoint and don't want to restore some elements of the chceckpoint to the state, we added a load_ignore_keys
parameter. Any specified (nested) keys will be ignored. Glob syntax is supported!
For example, to restore a checkpoint without the seed:
from composer import Trainer
trainer = Trainer(
...,
load_path="path/to/my/checkpoint.pt",
load_ignore_keys=["state/rank_zero_seed", "rng"],
)
See the Trainer API Reference for more information.
🪣 Object Stores
Composer v0.8.0 introduces an abstract Object Store API to support multiple object store drivers, such as boto3 (for Amazon S3) and Paramiko (for SFTP), in addition to the existing libcloud implementation.
For example, if you are training on AWS where credentials are available in the environment, here's how to to save checkpoints to a S3 object store via Boto3.
from composer import Trainer
from composer.loggers import ObjectStoreLogger
from composer.utils.object_store import S3ObjectStore
logger = ObjectStoreLogger(
object_store_cls=S3ObjectStore,
object_store_kwargs={
# These arguments will be passed into the S3ObjectStore -- e.g.:
# object_store = S3ObjectStore(**object_store_kwargs)
# Refer to the S3ObjectStore class for documentation
'bucket': 'my-bucket',
},
)
trainer = Trainer(
...,
loggers=logger,
)
# Train!
trainer.fit()
See the Object Store API Reference for more information.
🪨 Artifact Metadata
Composer automatically logs the epoch, batch, sample, and token counts as metadata when storing artifacts in Weights & Biases. See the API Reference for more information.
✂️ Gradient Clipping is now an Algorithm
To clean up the Trainer, we moved gradient clipping into an Algorithm. The grad_clip_norm
argument in the Trainer is deprecated and will be removed in a future version of Composer. Instead, use the Gradient Clipping algorithm:
For example:
from composer.algorithms import GradientClipping
from composer.trainer import Trainer
# Configure gradient clipping
gradient_clipping = GradientClipping()
# Configure the trainer
trainer = Trainer(
...,
algorithms=gradient_clipping,
)
# Train!
trainer.fit()
See the method card for more information.
🕒️ Removed batch_num_samples
and batch_num_tokens
from the state.
State properties batch_num_samples
and batch_num_tokens
have been removed.
Instead, use State.timestamp
for token and sample tracking.
🧑🤝🧑 DDP Sync Strategy
We changed the default DDP Sync Strategy to MULTI_AUTO_SYNC
, as FORCED_SYNC
doesn't work with all algorithms.
🏃 Moved the run_name
into the State
The run_name
has been added to the State object, so it is persisted with checkpoints. It has been removed from the Logger.
https://github.com/mosaicml/composer/compare/v0.7.1...v0.8.0
Published by ravi-mosaicml over 2 years ago
Composer v0.7.1 is released! Install via pip
:
pip install --upgrade mosaicml==0.7.1
Alternatively, install Composer with Conda:
conda install -c mosaicml mosaicml=0.7.1
wandb>=0.12.17
, to fix incompatibility with protobuf >= 4 (https://github.com/wandb/client/pull/3709)https://github.com/mosaicml/composer/compare/v0.7.0...v0.7.1
Published by ravi-mosaicml over 2 years ago
Composer v0.7.0 is released! Install via pip
:
pip install --upgrade mosaicml==0.7.0
Alternatively, install Composer with Conda:
conda install -c mosaicml mosaicml=0.7.0
🏎️ FFCV Integration
Composer supports FFCV, a fast dataloader for image datasets. We've found FFCV can speed up ResNet-56 training by 16%, in addition to existing speed-ups already supported by Composer! It's easy to use FFCV with any existing image dataset:
import ffcv
from ffcv.fields.decoders import IntDecoder, SimpleRGBImageDecoder
from torchvision.datasets import ImageFolder
from composer import Trainer
from composer.datasets.ffcv_utils import write_ffcv_dataset, ffcv_monkey_patches
# Convert the dataset to FFCV format
# This step needs to be done only once per dataset
dataset = ImageFolder(...)
ffcv_dataset_path = "my_ffcv_dataset.ffcv"
write_ffcv_dataset(dataset=dataset, write_path=ffcv_dataset_path)
# In FFCV v0.0.3, len(dataloader) is expensive. Fix that via a monkeypatch
ffcv_monkey_patches()
# Construct the train dataloader
train_dl = ffcv.Loader(
ffcv_dataset_path,
...
)
# Construct the trainer
trainer = Trainer(
train_dataloader=train_dl,
)
# Train using FFCV!
trainer.fit()
See our notebook on training with FFCV for a full example.
✅ Autoresume from Checkpoints
When setting autoresume=True
, Composer can automatically resume from an existing checkpoint before starting a new training run. Specifically, the trainer will look in the save_folder
(and any loggers that save artifacts) for the latest checkpoint; if none is found, then it'll start from the beginning.
This feature does not require a different entrypoint to distinguish between starting a new training run or automatically resuming from an existing one, making it easy to use Composer on spot preemptable cloud instances. Simply set autoresume=True
, point the instance to your training script, and Composer will handle the rest!
from composer import Trainer
# When using `autoresume`, it is required to specify the
# `run_name`, so Composer will know which training run to
# resume
run_name = "my_autoresume_training_run"
trainer = Trainer(
...,
run_name=run_name,
# specify where to save checkpoints
save_folder="./my_autoresume_training_run",
autoresume=True,
)
# Train! Composer will handle loading an existing
# checkpoint or starting a new training run
trainer.fit()
See the Trainer API Reference for more information.
♻️ Reuse the Trainer
Want to train on multiple dataloaders sequentially? Each trainer object now supports multiple calls to Trainer.fit()
, so you can continue training an existing model on a new dataloader, with new schedulers, all while using the same model and trainer object.
For example:
from torch.utils.data import DataLoader
from composer import Trainer
train_dl_1 = DataLoader(...)
trainer = Trainer(
model=model,
max_duration='5ep',
train_dataloader=train_dl_1,
)
# Train once!
trainer.fit()
# Train again with a new dataloader for another 5 epochs
train_dl_2 = DataLoader(...)
trainer.fit(
train_dataloader=train_dl_2,
duration='5ep',
)
See the Trainer API Reference for more information.
⚖️ Eval or Predict Only? No Problem
You can evaluate or predict on an existing model, without having to supply a train dataloader or training duration argument -- they're now optional.
import torchmetrics
from torch.utils.data import DataLoader
from composer import Trainer
# Construct the trainer
trainer = Trainer(model=model)
# Evaluate!
eval_dl = DataLoader(...)
trainer.eval(
dataloader=eval_dl,
metrics=torchmetrics.Accuracy(),
)
# Examine evaluation metrics
print("Eval metrics", trainer.state.metrics['eval'])
# Or, predict!
predict_dl = DataLoader(...)
trainer.predict(dataloader=predict_dl)
See the Trainer API Reference for more information.
🛑 Early Stopper and Threshold Stopper Callbacks
The Early Stopper and Threshold Stopper callbacks end training early when the target metrics are met:
from composer.callbacks.early_stopper import EarlyStopper
from torchmetrics.classification.accuracy import Accuracy
# Construct the callback
early_stopper = EarlyStopper(
monitor="Accuracy",
dataloader_label="eval",
patience=2,
)
# Construct the trainer
trainer = Trainer(
...,
callbacks=early_stopper,
max_duration="100ep",
)
# Train!
# Training will end early if the accuracy does not improve
# over two epochs
trainer.fit()
🪵 Load Checkpoints from Loggers
It's now possible to restore checkpoints from loggers that support file artifacts (such as the Weights & Baises Logger). No need to download your checkpoints manually anymore.
from composer import Trainer
from composer.loggers import WandBLogger
# Configure the W&B Logger
wandb_logger = WandBLogger(
# set to True to capture artifacts, like checkpoints
log_artifacts=True,
init_params={
'project': 'my-wandb-project-name',
},
)
# Then, to train and save checkpoints to W&B:
trainer = Trainer(
...,
loggers=wandb_logger,
save_folder="/tmp/checkpoints",
save_interval="1ep",
save_artifact_name="epoch{epoch}.pt",
)
# Finally, to load checkpoints from W&B
trainer = Trainer(
...,
load_object_store=wandb_logger,
load_path="epoch1.pt:latest",
)
⌛ Wall Clock, Evaluation, and Prediction Time Tracking
The timestamp object measures wall clock time via three new fields: total_wct
, epoch_wct
, and batch_wct
. These fields track the total elapsed training time, the elapsed training time of the current epoch, and the time to train the last batch. Read the wall clock time via a callback:
from composer import Callback, Trainer
class MyCallback(Callback):
def batch_end(self, state, event):
print(f"Total wct: {state.timetsamp.total_wct}")
print(f"Epoch wct: {state.timetsamp.epoch_wct}")
print(f"Batch wct: {state.timetsamp.batch_wct}")
# Construct the trainer with this callback
trainer = Trainer(
...,
callbacks=MyCallback(),
)
# Train!
trainer.fit()
In addition, the training state object has two new fields for tracking time during evaluation and prediction: eval_timestamp
and predict_timestamp
. These fields, just like any others on the state object, are accessible to algorithms, callbacks, and loggers.
Training DeepLabv3+ on the ADE20k Dataset
DeepLabv3+ is a common baseline model for semantic segmentation tasks. We provide a ComposerModel
implementation for DeepLabv3+ built using torchvision and mmsegmentation for the backbone and head, respectively.
We found the DeepLabv3+ baseline can be significantly improved using the new PyTorch pre-trained weights. Additional gains are made through a hyperparameter sweep.
We benchmark our DeepLabv3+ model on a single 8xA100 machine using ADE20k, a popular semantic segmentation dataset. The final results on ADE20k are:
Model | mIoU | Time-to-Train |
---|---|---|
Unoptimized DeepLabv3+ | 44.17 +/- 0.14 | 6.39 hr |
Optimized DeepLabv3+ | 45.78 +/- 0.26 | 4.67 hr |
Checkout our documentation for more info!
🍪 Additional Batch Type Support
Composer v0.7.0 removed the BatchDict
and BatchPair
types, and now supports any batch type. We're updating our algorithms to support batches of custom formats.
🏎️ Simplified Profiling Arguments
To simplify the Trainer constructor, the profiling arguments were replaced with a single profiler
argument, which takes an instance of the Profiler.
from composer.trainer import Trainer
from composer.profiler import PRofiler, JSONTraceHandler, cyclic_schedule
trainer = Trainer(
...,
profiler=Profiler(
trace_handlers=JSONTraceHandler(
folder=composer_trace_dir,
overwrite=True,
),
schedule=cyclic_schedule(
wait=0,
warmup=1,
active=4,
repeat=1,
),
torch_prof_folder=torch_trace_dir,
torch_prof_overwrite=True,
...,
)
)
See the profiling guide for additional information.
🚪 Event.FIT_END
and Engine.close()
With support for reusing the trainer for multiple calls to Trainer.fit
, callbacks and loggers are no longer closed at the end of a training run.
Instead, Event.FIT_END
was added, which can be used by Callbacks for anything that should happen at the end of each invocation of Trainer.fit
. See the Event Guide for aadditional inforrmation.
Finally, whenever the trainer is garbage collected or Trainer.close
is called, Callback.close
and Callback.post_close
are invoked, ensuring that they will be called only once per trainer.
⌛ State.timesamp
replaces State.timer
Removed State.timer
and replaced it with State.timestamp
, which is now a static Timestamp object. The training loop replaces State.timestamp
with a new object on each batch. See the Time Guide for additional information.
💿 Data Configuration
Two new proerties, State.dataloader
and State.dataloader_label
, were added to the state. These properties track the currently active dataloader (e.g. the training dataloader when training; the evaluation dataloader when evaluating).
In adddition, State.subset_num_batches
was renamed to State.dataloader_len
to reflect the actual dataloader length that will be used for training and evaluation.
A helper method State.set_dataloader
was added to ensure the dataloader properties are updated correctly.
⚖️ Removed the Deprecated Scale Schedule Algorithm
The scale schedule algorithm class, deprecated in v0.4.0, has been removed. Instead, use the scale_schedule_ratio
argument when constructing the trainer.
from composer import Trainer
from composer.optim.scheduler import MultiStepScheduler
trainer = Trainer(
...,
max_duration="20ep",
schedulers=MultiStepScheduler(milestones=["10ep", "16ep"]),
scale_schedule_ratio=0.5,
)
See the Scale Schedule Method Card for additional info.
Event.FIT_END
was not being called in the training loop (#1054)eval_interval
(#1045)https://github.com/mosaicml/composer/compare/v0.6.1...v0.7.0
Published by ravi-mosaicml over 2 years ago
Composer v0.6.1 is released!
Go ahead and upgrade; it's fully backwards compatible with Composer v0.6.0.
Install via pip
:
pip install --upgrade mosaicml==0.6.1
Alternatively, install Composer with Conda:
conda install -c mosaicml mosaicml=0.6.1
📎 Adaptive Gradient Clipping (AGC)
Adaptive Gradient Clipping (AGC) clips gradients based on the ratio of their norms with weights' norms. This technique helps stabilize training with large batch sizes, especially for models without batchnorm layers.
🚚 Exponential Moving Average (EMA)
Exponential Moving Average (EMA) is a model averaging technique that maintains an exponentially weighted moving average of the model parameters during training. The averaged parameters are used for model evaluation. EMA typically results in less noisy validation metrics over the course of training, and sometimes increased generalization.
🪵 Logger is available in the ComposerModel
The Logger is bound to the ComposerModel via the self.logger
attribute. It is available during training on all methods (other than __init__
).
For example, to log hidden activation:
class Net(ComposerModel):
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
if self.logger:
self.logger.data_batch({
"hidden_activation_norm": x.norm(2).item(),
})
x = x.view(-1, 320)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x)
🐛 Environment Collection Script
Composer v0.6.1 includes an environment collection script which generates a printout of your system configuration and python environment. If you run into a bug, the results from this script will help us debug the issue and fix Composer.
To collect your environment information:
$ pip install mosaicml # if composer is not already installed
$ composer_collect_env
Then, include the output in your GitHub Issue.
📜 TorchScriptable Algorithms
BlurPool, Ghost BatchNorm, and Stochastic Depth are now TorchScript-compatible. Try exporting your models with these algorithms enabled!
🏛️ ColOut on Segmentation
ColOut now supports segmentation-style models.
🚑️ Loggers capture the Traceback
We fixed a bug so the Loggers, such as the Weights & Biases Logger and the File Logger, will capture the traceback any exception that crashes the training process.
🏋️ Weights & Biases Logger Config
We fixed a bug where the the Weights & Biases Logger was not properly recording the configuration.
https://github.com/mosaicml/composer/compare/v0.6.0...v0.6.1
Published by ravi-mosaicml over 2 years ago
Composer v0.6.0 is released! Install via pip
:
pip install --upgrade mosaicml==0.6.0
Alternatively, install Composer with Conda:
conda install -c mosaicml mosaicml=0.6.0
🗃️ Automatic Gradient Accumulation
Composer v0.6.0 can automatically pick an appropriate value for gradient accumulation. The trainer will automatically catch
OutOfMemory exceptions and handle them gracefully. No need to manually tune this parameter for each model, batch size, and
hardware combination!
To use automatic gradient accumulation, set grad_accum='auto'
. For example:
trainer = Trainer(
...,
grad_accum='auto',
)
💾 Artifact Logging
Training on spot instances? Composer v0.6.0 introduces artifact logging, making it possible to store checkpoints and other artifacts directly to cloud storage. See the Object Store Logger and the Checkpointing Guide for more information.
Artifact Logging has replaced the run directory and the run directory uploader, which have been removed.
📊 Metric Values on the State
Composer v0.6.0 binds the computed metric values on the State. Go ahead and read these values from your own callbacks! We'll be releasing an early stopping callback in an upcoming Composer release.
⚠️ NoEffectWarning
and NotIntendedUseWarning
for Algorithms
Some algorithms, such as BlurPool, now emit a NoEffectWarning
or a NotIntendedUseWarning
when they're not being used appropriately.
🏃♀️ Training Run Names
We introduced a run_name
parameter in the Trainer to help organize training runs.
trainer = Trainer(
...,
run_name='awesome-traing-run',
)
We'll automatically pick one if the run name is not specified.
💈 Automatic Progress Bars
The ProgressBarLogger, formally called the TQDMLogger, is automatically enabled for all training runs.
To disable the progress bar, set progress_bar=False
. For example:
trainer = Trainer(
...,
progress_bar=False,
)
🪵 Logged Data in the Console
To print Logger calls to the console, set the log_to_console
and the console_log_level
arguments.
trainer = Trainer(
...,
log_to_console=True,
console_log_level="epoch",
)
By default, the console logger will only be enabled when progress_bar=False
. The default console log level is epoch
.
📃 Capturing stdout
and stderr
in Log Files
The FileLogger captures stdout
and stderr
by default now. Tracebacks will now be captured amongst other logging statements.
⬆️ PyTorch 1.11 Support
We've tested Composer on PyTorch 1.11. Go ahead and upgrade your dependencies!
✅ Checkpointing
We changed the checkpoint format to store the underlying model, not the DistributedDataParallel wrapped model. If you're using Composer to read checkpoints, there's nothing to change. But if you're reading Composer checkpoints manually, note that the module checkpoints will be formatted differently.
In addition, we changed the checkpointing argument names for the trainer.
save_artifact_name
and save_latest_artifact_name
allow checkpoints to be saved directly to artifact stores.save_num_checkpoints_to_keep
helps preserve local disk storage by automatically removing old checkpoints.load_path
replaces load_path_format
.save_name
replaces save_path_format
.save_latest_filename
replaces save_latest_format
.🏎️ Profiling
We added support for custom scheduling functions and re-designed how the profiler saves traces. Each profiling cycle will now have its own trace file. Trace merging happens automatically throughout the training process. Long-running profiling is now possible without the long wait at the end of training for the trace merge.
As part of this refactor, the profiler arguments have changed:
prof_trace_handlers
replaces prof_event_handlers
.prof_schedule
replaces prof_skip_first
, prof_wait
, prof_warmup
, prof_active
, and prof_repeat
. See the cyclic schedule function.torch_prof_folder
replaces torch_profiler_trace_dir
torch_prof_filename
, torch_prof_artifact_name
, torch_prof_overwrite
, and torch_prof_num_traces_to_keep
allow for customization on how PyTorch Profiler traces are saved.🏗️ TorchVision Model Architectures
We switched our vision models to use the TorchVision model architecture implementations where possible.
run_name
as a property of the Logger
by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/700
setup.py
by @Averylamp in https://github.com/mosaicml/composer/pull/761
test_trainer.py
by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/775
TQDMLogger
as the ProgressBarLogger
; remove terminal logging from the file logger by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/708
stdout
and stderr
capture to the FileLogger by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/710
NoEffectWarning
by @hanlint in https://github.com/mosaicml/composer/pull/720
import composer
by @dblalock in https://github.com/mosaicml/composer/pull/823
module.
prefix when using DDP by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/829
Full Changelog: https://github.com/mosaicml/composer/compare/v0.5.0...v0.6.0
Published by hanlint over 2 years ago
We are excited to share Composer v0.5, a library of speed-up methods for efficient neural network training. This release features:
bfloat16
Checkpointing models are now a Callback, so that users can easily write and add their own callbacks. The callback is automatically appended if a save_folder
is provided to the Trainer.
trainer = Trainer(
model=model,
algorithms=algorithms,
save_folder="checkpoints",
save_interval="1ep"
)
Alternatively, CheckpointSaver
can be directly added as a callback:
trainer = Trainer(..., callbacks=[
CheckpointSaver(
save_folder='checkpoints',
name_format="ep{epoch}-ba{batch}/rank_{rank}",
save_latest_format="latest/rank_{rank}",
save_interval="1ep",
weights_only=False,
)
])
Subclass from CheckpointSaver
to add your own logic for saving the best model, or saving at specific intervals. Thanks to @mansheej @siriuslee and other users for their feedback.
We've added experimental support for bfloat16
, which can be provided via the precision
argument to the Trainer:
trainer = Trainer(
...,
precision="bfloat16"
)
We've added support for fast streaming datasets. For NLP-based datasets such as C4, we use the HuggingFace datasets backend, and add dataset-specific shuffling, tokenization , and grouping on-the-fly. To support data parallel training, we added specific sharding logic for efficiency. See C4Datasets
for more details.
Vision streaming datasets are supported via a patched version of the webdatasets
package, and added support for data sharding by workers for fast augmentations. See composer.datasets.webdataset
for more details.
Configurations for GPT-3-like models ranging from 125m to 760m parameters are now released, and use DeepSpeed Zero Stage 0 for memory-efficient training.
We've also added the Single Shot Detection (SSD) model (Wei et al, 2016) with a ResNet34 backbone, based on the MLPerf reference implementation.
Our first Vision Transformer benchmark is the ViT-S/16 model from Touvron et al, 2021, and based on the vit-pytorch
package.
See below for the full details:
composer.algorithms
by @ajaysaini725 in https://github.com/mosaicml/composer/pull/603
object
as a base class; fix skipping documentation of forward
; fixed docutils dependency. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/643
num_workers
if set incorrectly by @abhi-mosaic in https://github.com/mosaicml/composer/pull/655
pycocotools
by @abhi-mosaic in https://github.com/mosaicml/composer/pull/656
__all__
by @hanlint in https://github.com/mosaicml/composer/pull/688
composer.optim
docstrings by @jbloxham in https://github.com/mosaicml/composer/pull/653
rank_zero_seed
on state by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/680
CheckpointLoader
into a load_checkpoint
function by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/693
CheckpointSaver
to a callback. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/687
C4Dataset
to repeat, handle max_samples
safely by @abhi-mosaic in https://github.com/mosaicml/composer/pull/722
Full Changelog: https://github.com/mosaicml/composer/compare/v0.4.0...v0.5.0
Published by hanlint over 2 years ago
run_event
for callbacks, removed deferred logging by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/85
composer.trainer.ddp
; replace with composer.utils.ddp
by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/105
atexit
with cleanup methods by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/112
total_batch_size
to train_batch_size
by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/137
run_mosaic_trainer.py
, cleaned up verbosity. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/170
dist
and DDP by @jbloxham in https://github.com/mosaicml/composer/pull/201
DataSpec
for the timing abstraction (#146) parts 3 and 4 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/178
pip install -e
be pip install --user -e
when running as root by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/232
str
and dict
in Trainer init
signature by @hanlint in https://github.com/mosaicml/composer/pull/277
num_classes=10
for CIFAR10_ResNet56
by @hanlint in https://github.com/mosaicml/composer/pull/293
tqdm.auto
for notebooks by @hanlint in https://github.com/mosaicml/composer/pull/298
Event.BATCH_END
and Event.EPOCH_END
after the timer is increm… by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/310
dist.barrier
in the checkpointer with try/finally by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/334
extra_init_params
to get rid of recursive config dicts by @siriuslee in https://github.com/mosaicml/composer/pull/316
create_from_hparams
by @jbloxham in https://github.com/mosaicml/composer/pull/351
template_default
fields in hparams by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/369
import composer.functional as cf
by @dblalock in https://github.com/mosaicml/composer/pull/368
Event.TRAINING_START
to Event.FIT
; remove Event.TRAINING_END
by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/263
validation
and metrics
by @hanlint in https://github.com/mosaicml/composer/pull/378
Makefile
instead of scripts; enable easier testing by @hanlint in https://github.com/mosaicml/composer/pull/387
conftest.py
by @hanlint in https://github.com/mosaicml/composer/pull/390
world_size
guard to trainer by @hanlint in https://github.com/mosaicml/composer/pull/392
steps_per_epoch
by @jbloxham in https://github.com/mosaicml/composer/pull/418
walkthrough
section of the docs; replace with module-level docstrings by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/417
composer.utils
by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/439
resize_targets
set to False
by default by @siriuslee in https://github.com/mosaicml/composer/pull/475
dist
warnings by @hanlint in https://github.com/mosaicml/composer/pull/474
metadata
in json files for algorithms
by @hanlint in https://github.com/mosaicml/composer/pull/471
from composer import ComposerModel
by @hanlint in https://github.com/mosaicml/composer/pull/496
selective_backprop
to select_using_loss
by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/532
Full Changelog: https://github.com/mosaicml/composer/compare/v0.3.1...v0.4.0
Published by Averylamp almost 3 years ago
Hotfix to fix installation of the composer
package
Published by Averylamp almost 3 years ago
composer
Entrypoint for DDP forking prior to script start!pip install mosaicml
!pip install git+https://github.com/mosaicml/composer@main