composer - v0.11.0

Published by bandish-shah almost 2 years ago

🚀 Composer v0.11.0

Composer v0.11.0 is released! Install via pip:

pip install --upgrade mosaicml==0.11.0

New Features

🧰 FSDP Beta Support

Composer now supports PyTorch FSDP! PyTorch FSDP is a strategy for distributed training, similar to PyTorch DDP, that distributes work using data-parallelism only. On top of this, FSDP uses model, gradient, and optimizer sharding to dramatically reduce device memory requirements, and enables users to easily scale and train large models.

Here's how easy it is to use FSDP with Composer:

import torch.nn as nn
from composer import Trainer

class Block (nn.Module):
    ...

# Your custom model
class Model(nn.Module):
    def __init__(self, n_layers):
        super().__init__()
        self.blocks = nn.ModuleList([
            Block(...) for _ in range(n_layers)
        ]),
        self.head = nn.Linear(...)
    def forward(self, inputs):
        ...

    # FSDP Wrap Function
    def fsdp_wrap_fn(self, module):
        return isinstance(module, Block)

    # Activation Checkpointing Function
    def activation_checkpointing_fn(self, module):
        return isinstance(module, Block)

# ComposerModel wrapper, used by the Trainer
# to compute loss, metrics, etc.
class MyComposerModel(ComposerModel):

    def __init__(self, n_layers):
        super().__init__()
        self.model = Model(n_layers)
        ...

    def forward(self, batch):
        ...

    def eval_forward(self, batch, outputs=None):
        ...

    def loss(self, outputs, batch):
        ...

# Pass your ComposerModel and fsdp_config into the Trainer
composer_model = MyComposerModel(n_layers=3)
fsdp_config = {
    'sharding_strategy': 'FULL_SHARD',
    'min_params': 1e8,
    'cpu_offload': False, # Not supported yet
    'mixed_precision': 'DEFAULT',
    'backward_prefetch': 'BACKWARD_POST',
    'activation_checkpointing': False,
    'activation_cpu_offload': False,
    'verbose': True
}

trainer = Trainer(
    model=composer_model,
    fsdp_config=fsdp_config,
    ...
)

trainer.fit()

For more information, please see our FSDP docs.

🚰 Streaming v0.1

We've spun off Streaming datasets into it's own repository! Streaming datasets is a high-performance drop-in for Torch IterableDataset, enabling users to stream training data from cloud based object stores. Streaming is shipping with built-in support for popular open source datasets (ADE20K, C4, COCO, Enwiki, ImageNet, etc.)

To get started, install the Streaming PyPi package:
```
pip install mosaicml-streaming
```
You can use the streaming Dataset class with the PyTorch native DataLoader class as follows:
```
import torch
from streaming import Dataset

dataloader = torch.utils.data.DataLoader(dataset=Dataset(remote='s3://...'))
```
For more information, please check out the Streaming docs.

✔👉 Simplified Checkpointing Interface

With this release we’ve greatly simplified configuration of loading and saving checkpoints in Composer.

To save checkpoints to S3, all you need to do is:

Specify with save_folder your full URI to your save directory destination (e.g. 's3://my-bucket/{run_name}/checkpoints')
Optionally, set save_filename to the pattern you want for your checkpoint file names

from composer.trainer import Trainer

# Checkpoint saving to S3.
trainer = Trainer(
    model=model,
    save_folder="s3://my-bucket/{run_name}/checkpoints",
        run_name='my-run',
    save_interval="1ep",
    save_filename="ep{epoch}.pt",
    save_num_checkpoints_to_keep=0,  # delete all checkpoints locally
        ...
)

trainer.fit()

Likewise, to load checkpoints from S3, all you have to do is:

Set load_path to the full URI to your desired checkpoint file (e.g.'s3://my-bucket/my-run/checkpoints/epoch13.pt')

from composer.trainer import Trainer

# Checkpoint loading from S3.
new_trainer = Trainer(
    model=model,
    train_dataloader=train_dataloader,
    max_duration="10ep",
    load_path="s3://my-bucket/my-run/checkpoints/ep13.pt",
   )

    new_trainer.fit()

For more information, please see our Checkpointing guide.

𐄳 Improved Distributed Experience

We’ve made it easier to write your own custom distributed entry points by exposing our distributed API. You can now leverage all of our helpful distributed functions and contexts.

For example, let's say we want to need to download a dataset in a distributed training application. To avoid race conditions where different ranks try to write the dataset to the same place, we need to ensure that only rank 0 downloads the dataset first:
```
import datetime
from composer.trainer.devices import DeviceGPU
from composer.utils import dist

dist.initialize(DeviceGPU(), datetime.timedelta(seconds=30)) # Initialize distributed module

if dist.get_local_rank() == 0: # Download dataset on rank zero
    dataset = download_my_dataset()
dist.barrier() # All ranks wait until dataset is downloaded

# Create and train your model!
```
For more information, please check out our Distributed API docs.

Bug Fixes

fix loss and eval_forward for HF models (#1597)
add more robust casting to int for fsdp min_params (#1608)
Deepspeed Docs Typo (#1605)
Fix mmdet typo (#1618)
Blurpool idempotent (#1625)
When model is not on meta device, initialization should occur on compute device not CPU (#1623)
Auto resumption (#1615)
Adjust speed monitor (#1645)
Hot fix console logging (#1643)
Lazy Logging + pretty print dict for hparams (#1653)
Fix many failing notebook tests (#1646)

What's Changed

Bump coverage[toml] from 6.4.4 to 6.5.0 by @dependabot in https://github.com/mosaicml/composer/pull/1583
Bump furo from 2022.9.15 to 2022.9.29 by @dependabot in https://github.com/mosaicml/composer/pull/1584
Add English Wikipedia 2020-01-01 dataset by @knighton in https://github.com/mosaicml/composer/pull/1572
Add pull request template by @dakinggg in https://github.com/mosaicml/composer/pull/1588
Bump ipykernel from 6.15.3 to 6.16.0 by @dependabot in https://github.com/mosaicml/composer/pull/1587
Update importlib-metadata requirement from <5,>=4.11.0 to >=5.0,<6 by @dependabot in https://github.com/mosaicml/composer/pull/1585
Bump sphinx-argparse from 0.3.1 to 0.3.2 by @dependabot in https://github.com/mosaicml/composer/pull/1586
Add step explicitly to ImageVisualizer logging calls by @dakinggg in https://github.com/mosaicml/composer/pull/1591
Image viz test by @dakinggg in https://github.com/mosaicml/composer/pull/1592
Remove unused fixture by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1594
Fixes RandAugment API by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1596
fix loss and eval_forward for HF models by @dskhudia in https://github.com/mosaicml/composer/pull/1597
Remove tensorflow-io from setup.py by @eracah in https://github.com/mosaicml/composer/pull/1577
Fixes enwiki for the newly processed wiki dataset by @dskhudia in https://github.com/mosaicml/composer/pull/1600
Change install to all by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1599
Remove log level and should_log_artifact by @dakinggg in https://github.com/mosaicml/composer/pull/1603
Add more robust casting to int for fsdp min_params by @dblalock in https://github.com/mosaicml/composer/pull/1608
Deepspeed Docs Typo by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1605
Object store logger refactor by @dakinggg in https://github.com/mosaicml/composer/pull/1601
Bump gitpython from 3.1.27 to 3.1.28 by @dependabot in https://github.com/mosaicml/composer/pull/1609
Bump tabulate from 0.8.10 to 0.9.0 by @dependabot in https://github.com/mosaicml/composer/pull/1610
Log the number of GPUs and nodes Composer running on. by @eracah in https://github.com/mosaicml/composer/pull/1604
Update MLPerfCallback for v2.1 by @hanlint in https://github.com/mosaicml/composer/pull/1607
Remove object store cls by @dakinggg in https://github.com/mosaicml/composer/pull/1606
Add LAMB Optimizer by @hanlint in https://github.com/mosaicml/composer/pull/1613
Mmdet adapter by @A-Jacobson in https://github.com/mosaicml/composer/pull/1545
Fix mmdet typo by @Landanjs in https://github.com/mosaicml/composer/pull/1618
update torchmetrics requirement by @hanlint in https://github.com/mosaicml/composer/pull/1620
Add distributed sampler error by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1598
Landan/deeplabv3 ade20k example by @Landanjs in https://github.com/mosaicml/composer/pull/1593
Upgrade CodeQL Action to version 2 by @karan6181 in https://github.com/mosaicml/composer/pull/1628
Blurpool idempotent by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1625
Defaulting streaming dataset version to 2 by @karan6181 in https://github.com/mosaicml/composer/pull/1616
Abhi/fsdp bugfix 0 11 by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1623
Remove warning when master_port is auto selected by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1629
Remove unused import by @dakinggg in https://github.com/mosaicml/composer/pull/1630
Usability improvements to intitialize_dist() by @growlix in https://github.com/mosaicml/composer/pull/1619
Remove Graph in Auto Grad Accum by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1631
Auto resumption by @dakinggg in https://github.com/mosaicml/composer/pull/1615
add stop method by @hanlint in https://github.com/mosaicml/composer/pull/1627
S3 Checkpoint Saving By URI by @eracah in https://github.com/mosaicml/composer/pull/1614
S3 Checkpoint loading from URI by @eracah in https://github.com/mosaicml/composer/pull/1624
Add mvpatel2000 as codeowner for algos by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1640
Adjust speed monitor by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1645
Adding in FSDP Docs by @bcui19 in https://github.com/mosaicml/composer/pull/1621
Attempt to fix flaky doctest by @dakinggg in https://github.com/mosaicml/composer/pull/1647
Fix Missing Underscores in FSDP Docs by @bcui19 in https://github.com/mosaicml/composer/pull/1648
Fixed html path for make host command for docs by @karan6181 in https://github.com/mosaicml/composer/pull/1642
Fix hyperparameters logged to console even when progress_bar and log_to_console are False by @eracah in https://github.com/mosaicml/composer/pull/1643
Fix ImageNet Example normalization values by @Landanjs in https://github.com/mosaicml/composer/pull/1641
Python log level by @dakinggg in https://github.com/mosaicml/composer/pull/1651
Changed default logging to WARN for doctests by @eracah in https://github.com/mosaicml/composer/pull/1644
Add Event.AFTER_LOAD by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1652
Lazy Logging + pretty print dict for hparams by @eracah in https://github.com/mosaicml/composer/pull/1653
Fix todo in memory monitor by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1654
Tests for Idempotent Surgery by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1639
Remove c4 dataset by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1635
Update torchmetrics by @hanlint in https://github.com/mosaicml/composer/pull/1656
Search index filtered by project by @nqn in https://github.com/mosaicml/composer/pull/1549
FSDP Tests by @bcui19 in https://github.com/mosaicml/composer/pull/1650
Add composer version to issue template by @dakinggg in https://github.com/mosaicml/composer/pull/1657
Fix many failing notebook tests by @dakinggg in https://github.com/mosaicml/composer/pull/1646
Re-build the Docker images to resolve pip version error by @bandish-shah in https://github.com/mosaicml/composer/pull/1655

Full Changelog: https://github.com/mosaicml/composer/compare/v0.10.1...v0.11.0

composer - v0.10.1

Published by bandish-shah about 2 years ago

🚀 Composer v0.10.1

Composer v0.10.1 is released! Install via pip:

pip install --upgrade mosaicml==0.10.1

New Features

𐄷 Weight Standardization

Weight Standardization reparametrizes convolutional weights such that the fan-in dimensions have zero mean and unit standard deviation. This could slightly improve performance at the expensive of 5% lower throughput. This has been used in several papers to train with smaller batch sizes, with normalization layers besides batch norm, and for transfer learning.

Using Weight Standardization with the Composer Trainer:
```
import composer
 
# Apply Weight Standardization (when training is initialized)
weight_std = composer.algorithms.WeightStandardization()

# Train with Weight Standardization
trainer = composer.trainer.Trainer(
    ...
    algorithms=[weight_std]
)
trainer.fit()
```
Using Weight Standardization with the Composer functional interface:
```
import composer
from torchvision.models import resnet50
 
my_model = resnet50()
 
# Apply weight standardization to model
my_model = composer.functional.weight_standardization(my_model)
```
Please see the Weight Standardization Method Card for more details.

Bug Fixes

Fix for checkpoints not being saved automatically at the end of a run (#1552)
Fix Onnx export for Composer HuggingFaceModels (#1557)
Fix for MIoU metric producing NaN's (#1558)
CometML logger documentation updates and fixes (#1567, #1570, #1571)
WandB image visualizer fix (#1591)

What's Changed

Update evaluate_periodically() when eval interval is of type Duration by @karan6181 in https://github.com/mosaicml/composer/pull/1523
Quality of life updates to EMA by @coryMosaicML in https://github.com/mosaicml/composer/pull/1524
Add ADE20K and COCO v2 dataset behind a version flag by @karan6181 in https://github.com/mosaicml/composer/pull/1528
Pinned setuptools version to fix distutils version error by @karan6181 in https://github.com/mosaicml/composer/pull/1536
Less strict name formatting by @hanlint in https://github.com/mosaicml/composer/pull/1535
Defaulting streaming dataset version to 1 and add a deprecation warning by @karan6181 in https://github.com/mosaicml/composer/pull/1532
Changing 'stable' to 'latest' in notebooks in examples by @bcui19 in https://github.com/mosaicml/composer/pull/1534
Bump furo from 2022.6.21 to 2022.9.15 by @dependabot in https://github.com/mosaicml/composer/pull/1540
Bump fasteners from 0.17.3 to 0.18 by @dependabot in https://github.com/mosaicml/composer/pull/1538
Add Pandoc to Docker images, bump version to 2.19.2 by @bandish-shah in https://github.com/mosaicml/composer/pull/1550
Removed streaming version 2 from yaml since version 1 is default by @karan6181 in https://github.com/mosaicml/composer/pull/1551
Bump ipykernel from 6.15.2 to 6.15.3 by @dependabot in https://github.com/mosaicml/composer/pull/1548
Bump yamllint from 1.27.1 to 1.28.0 by @dependabot in https://github.com/mosaicml/composer/pull/1546
Bump traitlets from 5.3.0 to 5.4.0 by @dependabot in https://github.com/mosaicml/composer/pull/1539
Object Store Logger Race Condition + EMA Fix by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1552
Adding in erroring for when using GradMonitor and DeepSpeed by @bcui19 in https://github.com/mosaicml/composer/pull/1555
Bump pypandoc from 1.8.1 to 1.9 by @dependabot in https://github.com/mosaicml/composer/pull/1559
Update context to raise errror by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1561
Fix MIoU metric when self.total_union==0 by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1558
Move dataloader initialize_object to factory methods by @hanlint in https://github.com/mosaicml/composer/pull/1510
Weight Standardization method by @Landanjs in https://github.com/mosaicml/composer/pull/1562
Update comet links to include query params and point to main site by @dakinggg in https://github.com/mosaicml/composer/pull/1567
remove dead line in alibi by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1568
GLU Fixes by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1564
Add FSDP strategy by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1553
Comet example by @dakinggg in https://github.com/mosaicml/composer/pull/1570
Add missing _enabled flag, post_close, and clean up comet ml tests by @dakinggg in https://github.com/mosaicml/composer/pull/1571
Consistent Method Card Style by @growlix in https://github.com/mosaicml/composer/pull/1407
add missing return in context by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1574
Remove eval batch split by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1576
Fix Onnx Export for Composer HuggingFaceModels by @nik-mosaic in https://github.com/mosaicml/composer/pull/1557
Revert checkpoint rename by @hanlint in https://github.com/mosaicml/composer/pull/1579

New Contributors

@bcui19 made their first contribution in https://github.com/mosaicml/composer/pull/1534

Full Changelog: https://github.com/mosaicml/composer/compare/v0.10.0...v0.10.1

composer - v0.10.0

Published by bandish-shah about 2 years ago

🚀 Composer v0.10.0

Composer v0.10.0 is out! This latest release adds support for CometML Experiment tracking, automatic selection of evaluation batch size, API enhancements for Evaluation/Logging/Metrics and a preview of our new streaming datasets repository!

pip install --upgrade mosaicml==0.10.0

New Features

☄️ Comet Experiment Tracking (#1490)

We've added support for the popular Comet experiment tracker! To enable, simply create the logger and pass it to the Trainer object at initialization:
```
from composer import Trainer
from composer.loggers import CometMLLogger

cometml_logger = CometMLLogger()

trainer = Trainer(
    ...
    loggers=[cometml_logger],
)
```
Please see our Logging and CometMLLogger docs pages for details on usage.
🪄 Automatic Evaluation Batch Size Selection (#1417)

Composer now supports eval_batch_size='auto', which will choose the right evaluation batch size to avoid CUDA OOMs! Now, in conjunction with grad_accum='auto', you can run the same code on any hardware with no changes necessary. This makes it easy to add evaluation to a training script without having to pick and choose the right batch sizes to avoid CUDA OOMs.
🎯 Evaluation API Changes (#1479)

The Evaluation API has been updated to be consistent with the Trainer API. If the eval_dataloader was provided to the Trainer during initialization, eval can be invoked without needing to provide anything additional:
```
trainer = Trainer(
    eval_dataloader=...
)
trainer.eval()
```
Alternatively, the eval_dataloader can be passed directly to the eval() method:
```
trainer = Trainer(
    ...
)
trainer.eval(
    eval_dataloader=...
)
```
The eval_dataloader can be a pytorch dataloader, or for multiple metrics, a list of Evaluator objects.
🪵 Simplified Logging (#1416)

We've significantly simplified our internal logging interface:
- Removed the use of LogLevel throughout the logging, which was a mostly unused feature. Filtering logs are the responsibility of the logger.
- For better compatibility with external logging interfaces such as CometML or Weights & Biases, loggers now support the following methods: log_metrics, log_hyperparameters, and log_artifacts. Previous calls to data_fit, data_epeoch, .. have been removed.
🎯 validate --> eval_forward (#1411 , #1419)

Previously, ComposerModel implemented the validate(batch: Any) -> Tuple[Any, Any] method which returns an (input, target) tuple, and the Trainer handles updating the metrics. In v0.10, we return the metrics updating control to the user.

Now, models instead implement def eval_forward(batch: Any) which returns the outputs of evaluation, and also def update_metric(batch, outputs, metric) which updates the metric.

An example implementation for classification can be found in our ComposerClassifer base class:
```
    def update_metric(self, batch: Any, outputs: Any, metric: Metric) -> None:
        _, targets = batch
        metric.update(outputs, targets)

    def eval_forward(self, batch: Any, outputs: Optional[Any] = None) -> Any:
        return outputs if outputs is not None else self.forward(batch)
```
🕵️‍♀️ Evaluator changes

The Evaluator class now stores evaluation metric names instead of metric instances. For example:
```
glue_mrpc_task = Evaluator(
    label='glue_mrpc',
    dataloader=mrpc_dataloader,
    metric_names=['BinaryF1Score', 'Accuracy']
)
```
These metric names are matched against the metrics returned by the ComposerModel. The metric instances are now stored as deep copies in the State class as state.train_metrics or state.eval_metrics.
🚧 Streaming Datasets Repository Preview

We're in the process of splitting out streaming datasets into it's own repository! Streaming datasets is a high-performance drop-in replacement for Torch IterableDataset objects and enables you to stream your training data from cloud based object stores. For an early preview, please checkout the Streaming repo.
❌ YAHP deprecation

We are deprecating support for yahp, our hyperparameter configuration tool. Support for this will be removed in the following minor version release of Composer. We recommend users migrate to OmegaConf, or Hydra as tools.

Bug Fixes

Documentation fixes (#1408, #1422, #1425, #1413, #1432, #1403, #1426, #1396, #1446, #1466, #1443)
Upgrade WandB version (#1440)
fix import (#1442)
fix wrong extra deps group (#1449)
wandb bug fix (#1488)
Reset train metrics every batch (#1496)
fix auto grad accum (#1515)
Fix compression file remote download exception handling (#1526)
Add Pandoc to Docker images, bump version to 2.19.2 (#1550)

What's Changed

current metrics docs by @A-Jacobson in https://github.com/mosaicml/composer/pull/1402
merge nlp+hf notebooks by @A-Jacobson in https://github.com/mosaicml/composer/pull/1406
Add break epoch exception by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1415
Upgrade to torch 1.12.1 by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1409
Metrics refactor pt1 by @ishanashastri in https://github.com/mosaicml/composer/pull/1411
Use state algos by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1412
Add default ignore index by @moinnadeem in https://github.com/mosaicml/composer/pull/1421
Update default hparams for ResNet model card by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1423
update colout link in custom speedup notebook by @A-Jacobson in https://github.com/mosaicml/composer/pull/1408
Clean up prose in key files by @dblalock in https://github.com/mosaicml/composer/pull/1422
Relax codeowners by @bandish-shah in https://github.com/mosaicml/composer/pull/1424
Fix typo by @Landanjs in https://github.com/mosaicml/composer/pull/1425
Fix pre-commit checks failing on fresh checkout of dev by @dblalock in https://github.com/mosaicml/composer/pull/1414
Have docs use preferred import paths, not longest import paths by @dblalock in https://github.com/mosaicml/composer/pull/1413
Fix missing indent by @Landanjs in https://github.com/mosaicml/composer/pull/1432
eval_batch_size=auto by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1417
Simplify helper for conflicting files by @hanlint in https://github.com/mosaicml/composer/pull/1427
add install from dev instructions by @A-Jacobson in https://github.com/mosaicml/composer/pull/1403
Style/tone consistency update for tutorial notebooks by @alextrott16 in https://github.com/mosaicml/composer/pull/1426
Dynamic quantization + minor improvements in inference APIs by @dskhudia in https://github.com/mosaicml/composer/pull/1433
Upgrade WandB version by @moinnadeem in https://github.com/mosaicml/composer/pull/1440
Log multiple losses by @Landanjs in https://github.com/mosaicml/composer/pull/1375
Fix attribute by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1442
Expand evaluation doc by @alextrott16 in https://github.com/mosaicml/composer/pull/1396
Metrics Refactor Part 2 by @ishanashastri in https://github.com/mosaicml/composer/pull/1419
Create dependabot.yml by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1448
Methods overview fix by @growlix in https://github.com/mosaicml/composer/pull/1446
Bump custom-inherit from 2.3.2 to 2.4.0 by @dependabot in https://github.com/mosaicml/composer/pull/1451
Bump junitparser from 2.4.3 to 2.8.0 by @dependabot in https://github.com/mosaicml/composer/pull/1453
Update moto[s3] requirement from <3.2,>=3.1.12 to >=4.0.1,<5 by @dependabot in https://github.com/mosaicml/composer/pull/1450
Update monai requirement from <0.9,>=0.8.0 to >=0.9.0,<0.10 by @dependabot in https://github.com/mosaicml/composer/pull/1452
Update torch-optimizer requirement from <0.2,>=0.1.0 to >=0.3.0,<0.4 by @dependabot in https://github.com/mosaicml/composer/pull/1454
Bump cryptography from 37.0.2 to 37.0.4 by @dependabot in https://github.com/mosaicml/composer/pull/1457
Bump sphinxext-opengraph from 0.6.1 to 0.6.3 by @dependabot in https://github.com/mosaicml/composer/pull/1458
Bump coverage[toml] from 6.3.2 to 6.4.4 by @dependabot in https://github.com/mosaicml/composer/pull/1460
Bump nbsphinx from 0.8.8 to 0.8.9 by @dependabot in https://github.com/mosaicml/composer/pull/1459
Fix incorrect deps group in streaming requirement by @hanlint in https://github.com/mosaicml/composer/pull/1449
Logger Destination Refactor by @eracah in https://github.com/mosaicml/composer/pull/1416
Bump sphinx-markdown-tables from 0.0.15 to 0.0.17 by @dependabot in https://github.com/mosaicml/composer/pull/1463
Bump traitlets from 5.1.1 to 5.3.0 by @dependabot in https://github.com/mosaicml/composer/pull/1462
Bump vit-pytorch from 0.27 to 0.35.8 by @dependabot in https://github.com/mosaicml/composer/pull/1465
Bump furo from 2022.3.4 to 2022.6.21 by @dependabot in https://github.com/mosaicml/composer/pull/1467
Bump ipykernel from 6.9.2 to 6.15.1 by @dependabot in https://github.com/mosaicml/composer/pull/1470
Bump pytest from 7.1.0 to 7.1.2 by @dependabot in https://github.com/mosaicml/composer/pull/1469
Bump sphinxcontrib-katex from 0.8.6 to 0.9.0 by @dependabot in https://github.com/mosaicml/composer/pull/1476
Bump tabulate from 0.8.9 to 0.8.10 by @dependabot in https://github.com/mosaicml/composer/pull/1478
Bump yamllint from 1.26.3 to 1.27.1 by @dependabot in https://github.com/mosaicml/composer/pull/1481
Bump ipykernel from 6.15.1 to 6.15.2 by @dependabot in https://github.com/mosaicml/composer/pull/1482
Refactor CheckpointSaver by @hanlint in https://github.com/mosaicml/composer/pull/1428
Clean up docs Makefile by @eracah in https://github.com/mosaicml/composer/pull/1466
Model surgery info -> debug by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1485
Docker image with Flash Attention by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1471
Fix WandBLogger bug with inaccurate step count by @eracah in https://github.com/mosaicml/composer/pull/1488
Update Eval API by @hanlint in https://github.com/mosaicml/composer/pull/1479
Random Names with Fixed Seed by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1487
ResNet50 on ImageNet training script example by @Landanjs in https://github.com/mosaicml/composer/pull/1434
Remove hparams from test_precision and test_state by @hanlint in https://github.com/mosaicml/composer/pull/1486
Clean up save_checkpoint by @hanlint in https://github.com/mosaicml/composer/pull/1484
Remove hparams from test_ddp by @hanlint in https://github.com/mosaicml/composer/pull/1489
update model token embeddings according to tokenizer len by @ananyahjha93 in https://github.com/mosaicml/composer/pull/1493
BERT classifier metrics depend on num_labels by @alextrott16 in https://github.com/mosaicml/composer/pull/1495
Reset train metrics every batch by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1496
Algolia doc search by @nqn in https://github.com/mosaicml/composer/pull/1443
Squelch Engine debug logs by @hanlint in https://github.com/mosaicml/composer/pull/1497
Remove TODO by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1499
Remove hparams from checkpoint tests by @hanlint in https://github.com/mosaicml/composer/pull/1491
[Docs] Training ResNet-50 on AWS tutorial by @bandish-shah in https://github.com/mosaicml/composer/pull/1444
Refactor hparams in tests by @hanlint in https://github.com/mosaicml/composer/pull/1498
Bump pytest from 7.1.2 to 7.1.3 by @dependabot in https://github.com/mosaicml/composer/pull/1500
Improved comments and improved test code by @karan6181 in https://github.com/mosaicml/composer/pull/1502
Refactor GLUE fine-tune queuing to improve efficiency and add task-specific seed sweeps by @alextrott16 in https://github.com/mosaicml/composer/pull/1363
Raise ValueError for Profiler + Auto Grad Accum by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1504
add yahp deprecation warnings by @hanlint in https://github.com/mosaicml/composer/pull/1505
Move logic from initialize_object to object store class by @hanlint in https://github.com/mosaicml/composer/pull/1508
Fix run name comment by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1509
Add CometML Support by @eracah in https://github.com/mosaicml/composer/pull/1490
Raise ValueError if missing a surgery algorithm by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1506
remove datasets from gitignore by @hanlint in https://github.com/mosaicml/composer/pull/1513
fix auto grad accum by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1515
Use eval context by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1516
Update tensorflow-io requirement from <0.27,>=0.26.0 to >=0.26.0,<0.28 by @dependabot in https://github.com/mosaicml/composer/pull/1522
Bump cryptography from 37.0.4 to 38.0.1 by @dependabot in https://github.com/mosaicml/composer/pull/1521
Fix SAM loss by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1518
Fixed remote path in streaming dataloader facesynthetics jupyter notebook by @karan6181 in https://github.com/mosaicml/composer/pull/1519
Rework auto grad accum checks by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1517
[xs] remove libcloudhparams from test_filehelpers.py by @hanlint in https://github.com/mosaicml/composer/pull/1514
Add v2 datasets behind a version flag by @knighton in https://github.com/mosaicml/composer/pull/1507
Fix compression file remote download exception handling. by @knighton in https://github.com/mosaicml/composer/pull/1526

New Contributors

@ananyahjha93 made their first contribution in https://github.com/mosaicml/composer/pull/1493

Full Changelog: https://github.com/mosaicml/composer/compare/v0.9.0...v0.10.0

composer - v0.9.0

Published by bandish-shah about 2 years ago

🚀 Composer v0.9.0

Excited to share the release of Composer v0.9.0, which comes with an Inference Export API, beta support for Apple Silicon and TPU training, as well as expanded usability of NLP-related speed-up methods. This release includes 175 commits from 34 contributors, including 10 new contributors 🙌 !

pip install --upgrade mosaicml==0.9.0

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.9.0

New Features

📦 Export for inference APIs

Train with Composer and deploy anywhere! We have added a dedicated export API as well as an export training callback to allow you to export Composer-trained models for inference, supporting popular formats such as torchscript and ONNX.

For example, here’s how to export a model in torchscript format:

from composer.utils import export_for_inference

# Invoking export with a trained model
export_for_inference(model=model, 
                     save_format='torchscript', 
                     save_path=model_save_path)

Here’s an example of using the training callback, which automatically exports the model at the end of training to ONNX format:

from composer.callbacks import ExportForInferenceCallback

# Initializing Trainer with the export callback
callback = ExportForInferenceCallback(save_format='onnx', 
                                                                            save_path=model_save_path)
trainer = Trainer(model=model,
                                callbacks=callback,
                                train_dataloader=dataloader,
                                max_duration='10ep')

# Model will be exported at the end of training
trainer.fit()

Please see our Exporting for Inference notebook for more information.

📈 ALiBi support for BERT training

You can now use ALiBi (Attention with Linear Biases; Press et al., 2021) when training BERT models with Composer, delivering faster training and higher accuracy by leveraging shorter sequence lengths.

ALiBi improves the quality of BERT pre-training, especially when pre-training uses shorter sequence lengths than the downstream (fine-tuning) task. This allows models with ALiBi to reach higher downstream accuracy with less pre-training time.

Example of using ALiBi as an algorithm with the Composer Trainer:
```
# Create an instance of a BERT masked language model
model = composer.models.create_bert_mlm()

# Apply ALiBi (when training is initialized)
alibi = composer.algorithms.alibi(max_sequence_length=1024)

# Train with ALiBi
trainer = composer.trainer.Trainer(
    model=model,
    train_dataloader=train_dataloader,
    algorithms=[alibi]
)
trainer.fit()
```
Example using the Composer Functional API:
```
import composer.functional as cf

# Create an instance of a BERT masked language model
model = composer.models.create_bert_mlm()

# Apply ALiBi and expand the model's maximum sequence length to 1024
cf.apply_alibi(model=model, max_sequence_length=1024)
```
AliBi can also now be extended to work with custom models by registering your attention and embedding layers. Please see our ALiBi method card for more information.
🧐 Entry point for GLUE tasks pre-training and fine-tuning

You can now easily pre-train and fine-tune NLP models across all GLUE (General Language Understanding Evaluation) tasks through one simple entry point! The entry point handles model saving and loading, spawns GLUE tasks in parallel across all available GPUs, and delivers a highly efficient evaluation of model performance.

Example of launching the entrypoint:
```
# This runs pre-training followed by fine-tuning.
# --training_scheme can take either pretrain, finetune, or all depending on the task!
python run_glue_trainer.py -f glue_example.yaml --training_scheme all
```
Please see our GLUE entrypoint notebook for more information.
🤖 TPU support (in beta)

You can now use Composer to train your models on TPUs! Support is now available in Beta, and currently only supports single-core TPU training. Try it out, explore optimizations, and share your feedback and feature requests with us so we can make it better for you and for the community.

To use TPUs with Composer, simply specify a tpu device:
```
# Set device to `tpu`
trainer = composer.trainer.Trainer(
    model=model,
    train_dataloader=train_dataloader,
    max_duration=train_epochs,
    device='tpu')

# Run fit
trainer.fit()
```
Please see our Training with TPUs notebook for more information.
🍎 Apple Silicon support (beta)

Leverage Apple Silicon chips to train your models with Composer by providing the device='mps' argument:
```
trainer = Trainer(
    ...,
    device='mps'
)
```
We use the latest PyTorch MPS backend to execute the training. This requires torch version ≥1.12, and Max OSX 12.3+.

For more information on training with Apple M chips, see the PyTorch 1.12 blog and our API Reference for Composer specific details.
🚧 Contrib repository

Got a new method idea, or published a paper and want those methods to be easily accessible? We’ve created the mcontrib repository, with a lightweight process to contribute new algorithms. We’re happy to work directly with you to benchmark these methods and eventually “promote” them to Composer for use by end customers.

Please checkout the README for details on how to contribute a new algorithm. For more details on how to write speed-up methods, see our notebook on custom speed-up methods.

Additional API Changes

🔢 Passes Module

The order in which algorithms are run matters significantly during composition. With this release we refactored algorithm passes into their own passes module. Users can now register custom passes (for custom algorithms) with the Engine. Please see #1377 for more information.
🗄️ Default Checkpoint Extension

The CheckpointSaver now defaults to using the *.pt extension for checkpoint fienames. Please see #1370 for more information.
👁️ Models Refactor

Most vision models (ResNet, MNIST, ViT, EfficientNet) have been refactored from classes to a factory function. For example ComposerResNet -> composer_resnet.
```
# before
from composer.models import ComposerResNet
model = ComposerResNet(..)

from composer.models import composer_resnet  # after
model = composer_resnet(..)
```
The same refactor has been done for NLP as well, e.g. BERTModel -> create_bert_mlm and create_bert_classification.

See #1227 (vision) and #1130 (NLP) for more details.
➕ Misc API Changes
- BreakEpochException has been removed.
- state.is_model_deepspeed has been moved to composer.utils.is_model_deepspeed.
- Helper function monitored_barrier has been added to composer distributed.

Bug Fixes

Add informative error for infer batch size issues (#1401)
Fix ImagenetDatasetHparams bug (#1392), resolves #1111
Fix hparams error condition checking (#1394)
Fix AMP resumption with grad scaler (#1376)
Auto Grad Accum Cache Clearing (#1380), fixes issue reported in #1331
Fix default precision (#1369)
Fix the profiler on multi-node training (#1358), resolves #1270
Retry SFTP on Size Mismatch (#1300)
Fix scheduler edge cases (#1350), resolves #1077
Fix a race condition in the object store logger (#1328)
Fix WandB load from checkpoint (#1326)
Fix Notebook Progress Bars (#1313)

Commits

What's Changed

Fix DeepSpeed typo in docstring by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1188
Move grad_accum logging to every step by @coryMosaicML in https://github.com/mosaicml/composer/pull/1187
Update STYLE_GUIDE with details on Documentation by @bandish-shah in https://github.com/mosaicml/composer/pull/1183
ProgressBar Units by @hanlint in https://github.com/mosaicml/composer/pull/1190
Added Xavier Normal initializer by @vladd-i in https://github.com/mosaicml/composer/pull/1196
Updated cost figure by @nqn in https://github.com/mosaicml/composer/pull/1180
Remove algorithm yamls by @hanlint in https://github.com/mosaicml/composer/pull/1193
Fix the Composer Launch Script for the Composer Dockerimage; Default nproc = torch.cuda.device_count() if not specified via env by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1195
Bert model card by @A-Jacobson in https://github.com/mosaicml/composer/pull/1198
Add Notes on Early Stopping by @anisehsani in https://github.com/mosaicml/composer/pull/1182
Stochastic depth that preserves weights by @Landanjs in https://github.com/mosaicml/composer/pull/1085
Adding Gated Linear Units as an algorithm by @moinnadeem in https://github.com/mosaicml/composer/pull/1192
A utility to fuse parallel linear layers in FX-traced models by @dskhudia in https://github.com/mosaicml/composer/pull/1189
Build+push Composer dockerimages to mosaicml/composer_staging by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1197
Fix the SFTP Object Store by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1202
Bert emoji by @A-Jacobson in https://github.com/mosaicml/composer/pull/1205
Adding a constant warmup scheduler by @linden-li in https://github.com/mosaicml/composer/pull/1203
Fix multi-GPU conflicts when downloading torchvision datasets by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1201
Add caveats about automatic gradient accumulation by @hanlint in https://github.com/mosaicml/composer/pull/1207
Remove the composer_train entrypoint; put it back in examples by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1211
Fix Composer staging dockerimages by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1210
Set SFTP Object Store Private Key Filepath from an Environ by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1212
[xs] Fix progress bars in get_file by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1216
Cleanup SFTP url parsing for StreamingDataset by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1217
Fix Symlinks on Non-Libcloud Object Stores by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1209
Fix the ObjectStoreLogger with Overwrite=True by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1208
Throughput metrics by @linden-li in https://github.com/mosaicml/composer/pull/1215
Fix module surgery for training resumptions with optimizers that save state by @dskhudia in https://github.com/mosaicml/composer/pull/1200
Update bert-base.yaml by @moinnadeem in https://github.com/mosaicml/composer/pull/1219
StreamingDataset: make remote optional, attempt to prettify docstrings. by @knighton in https://github.com/mosaicml/composer/pull/1220
Update vision-style StreamingDatasets to subclass VisionDataset by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1223
Improve docstrings. by @knighton in https://github.com/mosaicml/composer/pull/1222
shardwise zip streaming datasets by @milocress in https://github.com/mosaicml/composer/pull/1177
updated mosaic logos to composer logos in docs by @ejyuen in https://github.com/mosaicml/composer/pull/1221
Add COMPOSER_KNOWN_HOSTS_FILENAME for setting the sftp known hosts file environ by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1224
StreamingDataset: correctly handle exceptions in child download thread. by @knighton in https://github.com/mosaicml/composer/pull/1228
hot fix compression 404 by @milocress in https://github.com/mosaicml/composer/pull/1229
Treat any dropped SSH/SFTP connection as a transient error by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1225
refactor bert and gpt by @A-Jacobson in https://github.com/mosaicml/composer/pull/1130
Hotfix for S3 FileNotFoundError by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1233
Fix StreamingDataset compression with multi-rank by @milocress in https://github.com/mosaicml/composer/pull/1231
Refactor vision models by @Landanjs in https://github.com/mosaicml/composer/pull/1227
Update resnet50_medium.yaml by @lupesko in https://github.com/mosaicml/composer/pull/1235
Increase default timeout for StreamingC4 to 120s by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1234
Add Debug Log Statements; Fix Pyright by @hanlint in https://github.com/mosaicml/composer/pull/1218
Hotfix deeplabv3 by @Landanjs in https://github.com/mosaicml/composer/pull/1238
Add Tensorboard Logger by @eracah in https://github.com/mosaicml/composer/pull/1194
Move the model and optimizers to the device before Event.INIT by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1084
Fix bug in streaming iteration/downloading, refactor by @knighton in https://github.com/mosaicml/composer/pull/1239
Support sequence of losses in backwards pass by @Landanjs in https://github.com/mosaicml/composer/pull/1240
Add device_id param to DeviceGPU by @ishanashastri in https://github.com/mosaicml/composer/pull/1244
Update CutMix to work with segmentation style labels by @coryMosaicML in https://github.com/mosaicml/composer/pull/1230
Catching ChannelErrors on SFTP Failures by @moinnadeem in https://github.com/mosaicml/composer/pull/1245
Make StreamingDataset compression file easier to write/read by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1246
[XS] Updating console progress_bar logger to use max_duration units by @moinnadeem in https://github.com/mosaicml/composer/pull/1243
Catch botocore ClientError 403 by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1249
Tensorboard Notebook + Tutorial by @eracah in https://github.com/mosaicml/composer/pull/1250
Fix repeated words in event.py by @isaac0804 in https://github.com/mosaicml/composer/pull/1254
Make progressive resizing quieter by @coryMosaicML in https://github.com/mosaicml/composer/pull/1255
fix typo in example by @xloem in https://github.com/mosaicml/composer/pull/1259
Create a new boto3.Session() per S3ObjectStore instance by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1260
Fix recipe yamls for v0.8, add testing by @hanlint in https://github.com/mosaicml/composer/pull/1257
Automatic Stochastic depth on residual blocks by @dskhudia in https://github.com/mosaicml/composer/pull/1253
Sequence length warmup update and tests by @alextrott16 in https://github.com/mosaicml/composer/pull/1199
ProgressBarLogger UX Enhancements by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1264
Update to latest pytorch by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1262
Add packaging to meta.yaml; add py-cpuinfo max version by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1271
Fix Flaky Tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1272
Add callback for visualizing image inputs and outputs by @coryMosaicML in https://github.com/mosaicml/composer/pull/1266
Add scale_warmup argument to schedulers by @hanlint in https://github.com/mosaicml/composer/pull/1268
Switch Jenkins to r1z3 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1277
BERT and C4 updates by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1252
Default to allow_tf32=True for GPU Devices by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1275
Fix grad accum parsing in hparams by @hanlint in https://github.com/mosaicml/composer/pull/1256
Fix issue with doctest format in some docstring examples by @Landanjs in https://github.com/mosaicml/composer/pull/1269
Adds S3ObjectStore import to util init.py by @codestar12 in https://github.com/mosaicml/composer/pull/1274
Add tutorial on exporting for inference by @hanlint in https://github.com/mosaicml/composer/pull/1276
HTTPS downloads for streaming datasets by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1258
object stores for streaming datasets by @milocress in https://github.com/mosaicml/composer/pull/1248
Allow object name prefix for S3ObjectStore by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1278
Hotfix CO-658 by @milocress in https://github.com/mosaicml/composer/pull/1273
Fix S3 remote paths for StreamingDataset download by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1280
Add combo loss to DeepLabv3+ by @Landanjs in https://github.com/mosaicml/composer/pull/1265
Checkpoint backwards compatibility for ProgressBar by @hanlint in https://github.com/mosaicml/composer/pull/1287
Add missing callbacks by @hanlint in https://github.com/mosaicml/composer/pull/1286
Fix S3 prefix upload/download by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1288
Fix device inference in module surgery by @hanlint in https://github.com/mosaicml/composer/pull/1290
Actual fix to backwards compatibility by @hanlint in https://github.com/mosaicml/composer/pull/1289
Bugs in getting_started.ipynb by @rahulvigneswaran in https://github.com/mosaicml/composer/pull/1285
Add pytorch 1.12.0 docker image by @linden-li in https://github.com/mosaicml/composer/pull/1247
Fix TB Logger + ObjectStore quadratic complexity issue by doing 1 file per flush by @eracah in https://github.com/mosaicml/composer/pull/1283
Enable README Doctests with GPUs by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1279
Fix logging of hparams to object stores by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1297
[xs] Reformat the Composer Version String by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1301
Add monitored barrier for autograd accum by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1295
[xs] Notebook Fixes by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1299
[xs] Store the Composer version in one place. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1302
model export for inference. Functional API by @dskhudia in https://github.com/mosaicml/composer/pull/1294
Add a return_outputs flag to predict() by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1307
Integration Testing by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1305
Fix get_file_artifact in the WandBLogger to work on all ranks by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1304
Add documentation about run_name to Composer by @eracah in https://github.com/mosaicml/composer/pull/1298
Enforce FusedLayerNorm is ordered last by @alextrott16 in https://github.com/mosaicml/composer/pull/1309
Revert monitored barrier by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1311
[xs] Build the Composer Docker Image only on dev branch merges by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1308
Fix Notebook Progress Bars by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1313
Remove pytest-timeout by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1317
[Minor] Inference API parameter name change by @dskhudia in https://github.com/mosaicml/composer/pull/1315
Matthew/swa readme by @growlix in https://github.com/mosaicml/composer/pull/1292
Enable gloo backend by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1321
[xs] Fix pytest test filtering; Bump the minimum pytorch version to 1.10 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1320
revert gloo by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1324
Fix WandB load from checkpoint by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1326
ALiBi for BERT and ALiBi testing by @alextrott16 in https://github.com/mosaicml/composer/pull/1267
Update HF example with read of model eval accuracy by @lupesko in https://github.com/mosaicml/composer/pull/1332
Cleanup API Reference Titles by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1336
Fix a race condition in the object store logger by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1328
Auto Grad Accum Change to Warning by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1338
Add export for inference callback by @nik-mosaic in https://github.com/mosaicml/composer/pull/1323
Add save fine-tune model to HuggingFace example by @lupesko in https://github.com/mosaicml/composer/pull/1333
Update DWD optimizers by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1339
Cap Numpy Version by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1345
Update slack link by @hanlint in https://github.com/mosaicml/composer/pull/1344
Fix scheduler edge cases by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1350
Integration Tests for Object Stores and Loggers by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1322
Retry SFTP on Size Mismatch by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1300
[xs] Restore the dataloader and training properties in predict() by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1352
Add Precision Contexts by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1347
Update GLU logging strings by @moinnadeem in https://github.com/mosaicml/composer/pull/1348
Add domain-specific codeowners by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1354
fix marker by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1359
Fix the profiler on multi-node training by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1358
Glue Entrypoint by @ishanashastri in https://github.com/mosaicml/composer/pull/1263
Yahp v0.1.3 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1346
Move metrics to context by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1361
Refactor multiple losses to support dictionaries and fix discrepancies by @Landanjs in https://github.com/mosaicml/composer/pull/1349
Fix Coverage Reports on Jenkins by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1114
JSON Schemas by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1371
add filename extension by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1370
JSON Schemas pt 2 by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1373
Update Export for Inference methods by @nik-mosaic in https://github.com/mosaicml/composer/pull/1355
Fix default precision by @A-Jacobson in https://github.com/mosaicml/composer/pull/1369
Clean up unused exception by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1368
Revert "Clean up unused exception" by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1378
Remove Unused Exception by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1379
Auto Grad Accum Cache Clearing by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1380
Add ability to register algorithm passes by @hanlint in https://github.com/mosaicml/composer/pull/1377
Fix AMP resumption with grad scaler by @hanlint in https://github.com/mosaicml/composer/pull/1376
Update CUDA and remove NCCL downgrade from Dockerfile by @abhi-mosaic in https://github.com/mosaicml/composer/pull/1362
Add Notes on Artifact Logging by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1381
Print the microbatch size when using Adaptive Gradient Accumulation by @hanlint in https://github.com/mosaicml/composer/pull/1387
Cleaner API reference part 1: references with minimal import paths by @dblalock in https://github.com/mosaicml/composer/pull/1385
Add Event.BEFORE_DATALOADER by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1388
remove private s3 paths by @A-Jacobson in https://github.com/mosaicml/composer/pull/1389
Tutorial on training without Local Storage by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/1351
[inference] Update export_for_inference notebook with new APIs by @dskhudia in https://github.com/mosaicml/composer/pull/1360
Fix resnet warnings criteria by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1395
Fix hparams error by @mvpatel2000 in https://github.com/mosaicml/composer/pull/1394
Add knighton to codeowners for datasets by @knighton in https://github.com/mosaicml/composer/pull/1397
Fix ImagenetDatasetHparams bug by @nik-mosaic in https://github.com/mosaicml/composer/pull/1392
Decouple GLUE entry point saving and loading logic by @ishanashastri in https://github.com/mosaicml/composer/pull/1390
Glue example notebook by @ishanashastri in https://github.com/mosaicml/composer/pull/1383
Add informative error for infer batch size issues by @hanlint in https://github.com/mosaicml/composer/pull/1401
Only sync batchnorm statistics within a node for deeplab by @Landanjs in https://github.com/mosaicml/composer/pull/1391
Update DeepLabv3 pretrained weight interface to work with PyTorch 1.12 by @Landanjs in https://github.com/mosaicml/composer/pull/1399
tpu single core by @florescl in https://github.com/mosaicml/composer/pull/1400
Add support for Apple M chips by @hanlint in https://github.com/mosaicml/composer/pull/1405
[xs] Add mps and tpu device to Trainer docstrings by @hanlint in https://github.com/mosaicml/composer/pull/1410

Full Changelog: https://github.com/mosaicml/composer/compare/v0.8.2...v0.9.0

New Contributors

@vladd-i made their first contribution in https://github.com/mosaicml/composer/pull/1196
@linden-li made their first contribution in https://github.com/mosaicml/composer/pull/1203
@ejyuen made their first contribution in https://github.com/mosaicml/composer/pull/1221
@lupesko made their first contribution in https://github.com/mosaicml/composer/pull/1235
@isaac0804 made their first contribution in https://github.com/mosaicml/composer/pull/1254
@xloem made their first contribution in https://github.com/mosaicml/composer/pull/1259
@alextrott16 made their first contribution in https://github.com/mosaicml/composer/pull/1199
@codestar12 made their first contribution in https://github.com/mosaicml/composer/pull/1274
@rahulvigneswaran made their first contribution in https://github.com/mosaicml/composer/pull/1285
@nik-mosaic made their first contribution in https://github.com/mosaicml/composer/pull/1323

composer - v0.8.2

Published by bandish-shah about 2 years ago

🚀 Composer v0.8.2

Composer v0.8.2 is released! Install via pip:

pip install --upgrade mosaicml==0.8.2

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.8.2

🐛 Bug Fixes

Fixed Notebook Progress Bars in Colab

Fixes a bug introduced by #1264 which causes Composer running in Colab notebooks to error out with:
UnsupportedOperation: fileno.

Closes #1312. Fixed in PR #1314.

Changelog

https://github.com/mosaicml/composer/compare/v0.8.1...v0.8.2

composer - v0.8.1

Published by bandish-shah over 2 years ago

🚀 Composer v0.8.1

Composer v0.8.1 is released! Install via pip:

pip install --upgrade mosaicml==0.8.1

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.8.1

🎁 New Features

🖼️ Image Visualizer

The ImageVisualizer callback periodically logs the training and validation images when using the WandB logger. This is great for validating your dataloader pipeline, especially if extensive data augmentations are used. Also, when training on a semantic segmentation task, the callback can log the target segmentation mask and the predicted segmentation mask by setting the argument mode='segmentation'. See PR #1266 for more details. Here is an example of using the ImageVisualizer callback:
```
from composer import Trainer
from composer.callbacks import ImageVisualizer

# Callback to log 8 training images after every 100 batches
image_visualizer = ImageVisualizer()

# Construct trainer
trainer = Trainer(
    ...,
    callbacks=image_visualizer
)

# Train!
trainer.fit()
```
Here is an example visualization from the training set of ADE20k:
📶 TensorBoard Logging

You can now log metrics and losses from your Composer training runs with Tensorboard! See #1250 and #1283 for more details. All you have to do is create a TensorboardLogger object and add it
to the list of loggers in your Trainer object like so:
```
from composer import Trainer
from composer.loggers import TensorboardLogger

tb_logger = TensorboardLogger(log_dir="./my_tensorboard_logs")

trainer = Trainer(
    ...
    # Add your Tensorboard Logger to the trainer here.
    loggers=[tb_logger],
)

trainer.fit()
```
For more information, see this tutorial.
🔙 Multiple Losses

Adds support for multiple losses. If a model returns a tuple of losses, they are summed before the loss.backward() call. See #1240 for more details.

🌎️ Stream Datasets from HTTP URIs

You can now specify a HTTP URI for a Streaming Dataset remote. See #1258 for more detials. For example:

from composer.datasets.streaming import StreamingDataset
from torch.utils.data import DataLoader

# Construct the Dataset
dataset = StreamingDataset(
    ...,
    remote="https://example.com/dataset/",
)

# Construct the DataLoader
train_dl = DataLoader(dataset)

# Construct the Trainer
trainer = Trainer(
    ...,
    train_dataloader=train_dl,
)

# Train!
trainer.fit()

For more information on streaming datasets, see this tutorial.

🏄️ GPU Devices default to TF32 Matmuls

Beginning with PyTorch 1.12, the default behavior for computing FP32 matrix multiplies on NVIDIA Ampere devices was switched from TF32 to FP32. See PyTorch documentation here.

Since Composer is designed specifically for ML training with a focus on efficiency, we choose to preserve the old default of using TF32 on Ampere devices. This leads to significantly higher throughput when training in single precision, without impact training convergence. See PR #1275 for implementation details.

👋 Set the Device ID for GPU Devices

Specify the device ID within a DeviceGPU to train on when instantiating a Trainer object instead of using the local ID! For example,

from composer.trainer.devices.device_gpu import DeviceGPU

# Specify to use GPU 3 to train 
device = DeviceGPU(device_id=3)

# Construct the Trainer
trainer = Trainer(
    ...,
    device = device
)

# Train!
trainer.fit()

BERT and C4 Updates

We make some minor adjustments to our bert-base-uncased.yaml training config. In particular, we make the global train and eval batch sizes a power of 2. This maintains divisibility when using many GPUs in multi-node training. We also adjust the max_duration so that it converts cleanly to 70,000 batches.

We also upgrade our StreamingDataset C4 conversion script (scripts/mds/c4.py) to use a multi-threaded reader. On a 64-core machine we are able to convert the 770GB train split to .mds format in ~1.5hr.
📂 Set a prefix when using a S3ObjectStore

When using S3ObjectStore for applications like checkpointing, it can be useful to provide path prefixes, mimicking folder/subfolder directories like on a local filesystem. When prefix is provided, any objects uploaded with S3ObjectStore will be stored at f's3://{self.bucket}/{self.prefix}{object_name}'.
⚖️ Scale the Warmup Period of Composer Schedulers

Added a new flag scale_warmup to schedulers that will scale the warmup period when a scale schedule ratio is applied. Default is False to mirror default behavior. See #1268 for more detials.
🧊 Stochastic Depth on Residual Blocks

Residual blocks are detected automatically and replaced with stochastic versions. See #1253 for more details.

🐛 Bug Fixes

Fixed Progress Bars

Fixed a bug where the the Progress Bars jumped around and did not stream properly when tailing the terminal over the network. Fixed in #1264, #1287, and #1289.
Fixed S3ObjectStore in Multithreaded Environments

Fixed a bug where the boto3 crashed when creating the default session in multiple threads simultaniously (see https://github.com/boto/boto3/issues/1592). Fixed in #1260.
Retry on ChannelException errors in the SFTPObjectStore

Catch ChannelException SFTP transient error and retry. Fixed in #1245.
Treating S3 Permission Denied Errors as Not Found Errors

We update our handling of botocore 403 ClientErrors to interpret them as FileNotFoundErrors. We do this because of a situation that occurs when a user has no S3 credentials configured, and tries to read from a bucket with public files. For privacy, Amazon S3 raises 403 (Permission Denied) instead of 404 (Not Found) errors. As such, PR #1249 treats 403 ClientErrors as FileNotFoundErrors.
Fixed Parsing of grad_accum in the TrainerHparams

Fixes an error where the command line override --grad_accum lead to incorrect parsing. Fixed in #1256.
Fixed Example YAML Files

Our recipe configurations (YAML) are updated to the latest version, and a test was added to enforce correctness moving forward. Fixed in #1235 and #1257.

Changelog

https://github.com/mosaicml/composer/compare/v0.8.0...v0.8.1

composer - v0.8.0

Published by ravi-mosaicml over 2 years ago

🚀 Composer v0.8.0

Composer v0.8.0 is released! Install via pip:

pip install --upgrade mosaicml==0.8.0

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.8.0

New Features

🤗 HuggingFace ComposerModel

Train your HuggingFace models with Composer! We introduced a HuggingFaceModel that converts your existing 🤗 Transformers models into a ComposerModel.

For example:

import transformers
from composer.models import HuggingFaceModel

# Define the model
hf_model = transformers.AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

# Convert it into a ComposerModel
model = HuggingFaceModel(hf_model)

# Construct the trainer
trainer = Trainer(
    ...,
    model,
)

# Train!
trainer.fit()

For more information, see the example on fine-tuning a pretrained BERT with Composer.

🫕 Fused Layer Norm

Fused LayerNorm replaces implementations of torch.nn.LayerNorm with a apex.normalization.fused_layer_norm. The fused kernel provides increased GPU utilization.

For example:

from composer.trainer import Trainer
from composer.algorithms import FusedLayerNorm

# Initialize the algorithm
alg = FusedLayerNorm()

# Construct the trainer
trainer = Trainer(
    algorithms=alg,
)

# Train!
trainer.fit()

See the method card for more information.

💾 Ignore Checkpoint Parameters

If you have a checkpoint and don't want to restore some elements of the chceckpoint to the state, we added a load_ignore_keys parameter. Any specified (nested) keys will be ignored. Glob syntax is supported!

For example, to restore a checkpoint without the seed:
```
from composer import Trainer

trainer = Trainer(
    ...,
    load_path="path/to/my/checkpoint.pt",
    load_ignore_keys=["state/rank_zero_seed", "rng"],
)
```
See the Trainer API Reference for more information.

🪣 Object Stores

Composer v0.8.0 introduces an abstract Object Store API to support multiple object store drivers, such as boto3 (for Amazon S3) and Paramiko (for SFTP), in addition to the existing libcloud implementation.

For example, if you are training on AWS where credentials are available in the environment, here's how to to save checkpoints to a S3 object store via Boto3.

from composer import Trainer
from composer.loggers import ObjectStoreLogger
from composer.utils.object_store import S3ObjectStore

logger = ObjectStoreLogger(
    object_store_cls=S3ObjectStore,
    object_store_kwargs={
        # These arguments will be passed into the S3ObjectStore -- e.g.:
        # object_store = S3ObjectStore(**object_store_kwargs)
        # Refer to the S3ObjectStore class for documentation
        'bucket': 'my-bucket',
    },
)

trainer = Trainer(
    ...,
    loggers=logger,
)

# Train!
trainer.fit()

See the Object Store API Reference for more information.

🪨 Artifact Metadata

Composer automatically logs the epoch, batch, sample, and token counts as metadata when storing artifacts in Weights & Biases. See the API Reference for more information.

API Changes

✂️ Gradient Clipping is now an Algorithm

To clean up the Trainer, we moved gradient clipping into an Algorithm. The grad_clip_norm argument in the Trainer is deprecated and will be removed in a future version of Composer. Instead, use the Gradient Clipping algorithm:

For example:
```
from composer.algorithms import GradientClipping
from composer.trainer import Trainer

# Configure gradient clipping
gradient_clipping = GradientClipping()

# Configure the trainer
trainer = Trainer(
    ...,
    algorithms=gradient_clipping,
)

# Train!
trainer.fit()
```
See the method card for more information.
🕒️ Removed batch_num_samples and batch_num_tokens from the state.

State properties batch_num_samples and batch_num_tokens have been removed.
Instead, use State.timestamp for token and sample tracking.
🧑‍🤝‍🧑 DDP Sync Strategy

We changed the default DDP Sync Strategy to MULTI_AUTO_SYNC, as FORCED_SYNC doesn't work with all algorithms.
🏃 Moved the run_name into the State

The run_name has been added to the State object, so it is persisted with checkpoints. It has been removed from the Logger.

Bug Fixes

In the Object Store Logger, added in retries for credential validation, and validating credentials only on global rank zero. (#1144)
Fixed a bug in the speed monitor where it returned negative wall clock times. (#1123)
Fixed how block-wise Stochastic Depth could freeze the trainer. (#1087)
Fixed a bug in the [MLPerfCallback] where sample counts were incorrect on per-sharded datasets. (#1156)

Changelog

https://github.com/mosaicml/composer/compare/v0.7.1...v0.8.0

composer - v0.7.1

Published by ravi-mosaicml over 2 years ago

🚀 Composer v0.7.1

Composer v0.7.1 is released! Install via pip:

pip install --upgrade mosaicml==0.7.1

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.7.1

Bug Fixes

Upgraded wandb>=0.12.17, to fix incompatibility with protobuf >= 4 (https://github.com/wandb/client/pull/3709)

Changelog

https://github.com/mosaicml/composer/compare/v0.7.0...v0.7.1

composer - v0.7.0

Published by ravi-mosaicml over 2 years ago

🚀 Composer v0.7.0

Composer v0.7.0 is released! Install via pip:

pip install --upgrade mosaicml==0.7.0

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.7.0

New Features

🏎️ FFCV Integration

Composer supports FFCV, a fast dataloader for image datasets. We've found FFCV can speed up ResNet-56 training by 16%, in addition to existing speed-ups already supported by Composer! It's easy to use FFCV with any existing image dataset:

import ffcv
from ffcv.fields.decoders import IntDecoder, SimpleRGBImageDecoder
from torchvision.datasets import ImageFolder

from composer import Trainer
from composer.datasets.ffcv_utils import write_ffcv_dataset, ffcv_monkey_patches

# Convert the dataset to FFCV format
# This step needs to be done only once per dataset
dataset = ImageFolder(...)
ffcv_dataset_path = "my_ffcv_dataset.ffcv"
write_ffcv_dataset(dataset=dataset, write_path=ffcv_dataset_path)

# In FFCV v0.0.3, len(dataloader) is expensive. Fix that via a monkeypatch
ffcv_monkey_patches()

# Construct the train dataloader
train_dl = ffcv.Loader(
    ffcv_dataset_path,
    ...
)

# Construct the trainer
trainer = Trainer(
    train_dataloader=train_dl,
)

# Train using FFCV!
trainer.fit()

See our notebook on training with FFCV for a full example.

✅ Autoresume from Checkpoints

When setting autoresume=True, Composer can automatically resume from an existing checkpoint before starting a new training run. Specifically, the trainer will look in the save_folder (and any loggers that save artifacts) for the latest checkpoint; if none is found, then it'll start from the beginning.

This feature does not require a different entrypoint to distinguish between starting a new training run or automatically resuming from an existing one, making it easy to use Composer on spot preemptable cloud instances. Simply set autoresume=True, point the instance to your training script, and Composer will handle the rest!
```
from composer import Trainer

# When using `autoresume`, it is required to specify the
# `run_name`, so Composer will know which training run to
# resume
run_name = "my_autoresume_training_run"

trainer = Trainer(
    ...,
    run_name=run_name,
    # specify where to save checkpoints
    save_folder="./my_autoresume_training_run",
    autoresume=True,
)

# Train! Composer will handle loading an existing
# checkpoint or starting a new training run
trainer.fit()
```
See the Trainer API Reference for more information.

♻️ Reuse the Trainer

Want to train on multiple dataloaders sequentially? Each trainer object now supports multiple calls to Trainer.fit(), so you can continue training an existing model on a new dataloader, with new schedulers, all while using the same model and trainer object.

For example:

from torch.utils.data import DataLoader

from composer import Trainer

train_dl_1 = DataLoader(...)
trainer = Trainer(
    model=model,
    max_duration='5ep',
    train_dataloader=train_dl_1,
)

# Train once!
trainer.fit()

# Train again with a new dataloader for another 5 epochs
train_dl_2 = DataLoader(...)
trainer.fit(
    train_dataloader=train_dl_2,
    duration='5ep',
)

See the Trainer API Reference for more information.

⚖️ Eval or Predict Only? No Problem

You can evaluate or predict on an existing model, without having to supply a train dataloader or training duration argument -- they're now optional.


import torchmetrics
from torch.utils.data import DataLoader

from composer import Trainer

# Construct the trainer
trainer = Trainer(model=model)

# Evaluate!
eval_dl = DataLoader(...)
trainer.eval(
    dataloader=eval_dl,
    metrics=torchmetrics.Accuracy(),
)

# Examine evaluation metrics
print("Eval metrics", trainer.state.metrics['eval'])

# Or, predict!
predict_dl = DataLoader(...)
trainer.predict(dataloader=predict_dl)

See the Trainer API Reference for more information.

🛑 Early Stopper and Threshold Stopper Callbacks

The Early Stopper and Threshold Stopper callbacks end training early when the target metrics are met:

from composer.callbacks.early_stopper import EarlyStopper
from torchmetrics.classification.accuracy import Accuracy

# Construct the callback
early_stopper = EarlyStopper(
    monitor="Accuracy",
    dataloader_label="eval",
    patience=2,
)

# Construct the trainer
trainer = Trainer(
    ...,
    callbacks=early_stopper,
    max_duration="100ep",
)

# Train!
# Training will end early if the accuracy does not improve
# over two epochs
trainer.fit()

🪵 Load Checkpoints from Loggers

It's now possible to restore checkpoints from loggers that support file artifacts (such as the Weights & Baises Logger). No need to download your checkpoints manually anymore.

from composer import Trainer
from composer.loggers import WandBLogger

# Configure the W&B Logger
wandb_logger = WandBLogger(
    # set to True to capture artifacts, like checkpoints
    log_artifacts=True,
    init_params={
        'project': 'my-wandb-project-name',
    },
)

# Then, to train and save checkpoints to W&B:
trainer = Trainer(
    ...,
    loggers=wandb_logger,
    save_folder="/tmp/checkpoints",
    save_interval="1ep",
    save_artifact_name="epoch{epoch}.pt",
)

# Finally, to load checkpoints from W&B
trainer = Trainer(
    ...,
    load_object_store=wandb_logger,
    load_path="epoch1.pt:latest",
)

⌛ Wall Clock, Evaluation, and Prediction Time Tracking

The timestamp object measures wall clock time via three new fields: total_wct, epoch_wct, and batch_wct. These fields track the total elapsed training time, the elapsed training time of the current epoch, and the time to train the last batch. Read the wall clock time via a callback:
```
from composer import Callback, Trainer

class MyCallback(Callback):
    def batch_end(self, state, event):
        print(f"Total wct: {state.timetsamp.total_wct}")
        print(f"Epoch wct: {state.timetsamp.epoch_wct}")
        print(f"Batch wct: {state.timetsamp.batch_wct}")

# Construct the trainer with this callback
trainer = Trainer(
    ...,
    callbacks=MyCallback(),
)

# Train!
trainer.fit()
```
In addition, the training state object has two new fields for tracking time during evaluation and prediction: eval_timestamp and predict_timestamp. These fields, just like any others on the state object, are accessible to algorithms, callbacks, and loggers.
Training DeepLabv3+ on the ADE20k Dataset

DeepLabv3+ is a common baseline model for semantic segmentation tasks. We provide a ComposerModel implementation for DeepLabv3+ built using torchvision and mmsegmentation for the backbone and head, respectively.

We found the DeepLabv3+ baseline can be significantly improved using the new PyTorch pre-trained weights. Additional gains are made through a hyperparameter sweep.

We benchmark our DeepLabv3+ model on a single 8xA100 machine using ADE20k, a popular semantic segmentation dataset. The final results on ADE20k are:

Model mIoU Time-to-Train

Unoptimized DeepLabv3+ 44.17 +/- 0.14 6.39 hr

Optimized DeepLabv3+ 45.78 +/- 0.26 4.67 hr

Checkout our documentation for more info!

Model	mIoU	Time-to-Train
Unoptimized DeepLabv3+	44.17 +/- 0.14	6.39 hr
Optimized DeepLabv3+	45.78 +/- 0.26	4.67 hr

API Changes

🍪 Additional Batch Type Support

Composer v0.7.0 removed the BatchDict and BatchPair types, and now supports any batch type. We're updating our algorithms to support batches of custom formats.

🏎️ Simplified Profiling Arguments

To simplify the Trainer constructor, the profiling arguments were replaced with a single profiler argument, which takes an instance of the Profiler.

from composer.trainer import Trainer
from composer.profiler import PRofiler, JSONTraceHandler, cyclic_schedule

trainer = Trainer(
    ...,
    profiler=Profiler(
        trace_handlers=JSONTraceHandler(
            folder=composer_trace_dir,
            overwrite=True,
        ),
        schedule=cyclic_schedule(
            wait=0,
            warmup=1,
            active=4,
            repeat=1,
        ),
        torch_prof_folder=torch_trace_dir,
        torch_prof_overwrite=True,
        ...,
    )
)

See the profiling guide for additional information.

🚪 Event.FIT_END and Engine.close()

With support for reusing the trainer for multiple calls to Trainer.fit, callbacks and loggers are no longer closed at the end of a training run.

Instead, Event.FIT_END was added, which can be used by Callbacks for anything that should happen at the end of each invocation of Trainer.fit. See the Event Guide for aadditional inforrmation.

Finally, whenever the trainer is garbage collected or Trainer.close is called, Callback.close and Callback.post_close are invoked, ensuring that they will be called only once per trainer.
⌛ State.timesamp replaces State.timer

Removed State.timer and replaced it with State.timestamp, which is now a static Timestamp object. The training loop replaces State.timestamp with a new object on each batch. See the Time Guide for additional information.
💿 Data Configuration

Two new proerties, State.dataloader and State.dataloader_label, were added to the state. These properties track the currently active dataloader (e.g. the training dataloader when training; the evaluation dataloader when evaluating).

In adddition, State.subset_num_batches was renamed to State.dataloader_len to reflect the actual dataloader length that will be used for training and evaluation.

A helper method State.set_dataloader was added to ensure the dataloader properties are updated correctly.

⚖️ Removed the Deprecated Scale Schedule Algorithm

The scale schedule algorithm class, deprecated in v0.4.0, has been removed. Instead, use the scale_schedule_ratio argument when constructing the trainer.

from composer import Trainer
from composer.optim.scheduler import MultiStepScheduler

trainer = Trainer(
    ...,
    max_duration="20ep",
    schedulers=MultiStepScheduler(milestones=["10ep", "16ep"]),
    scale_schedule_ratio=0.5,
)

See the Scale Schedule Method Card for additional info.

Bug Fixes

Fixed an bug where Event.FIT_END was not being called in the training loop (#1054)
Fixed a bug where evaluation would not run at the end of training unless if it aligned with the eval_interval (#1045)
Fixed a bug where models trained with SWA could not be used with checkpoints (#1015)
Fixed a bug where the Speed Monitor included validation time in the training throughput measurements, resulting in slower reported throughput measurements (#1053)
Fixed a bug to make the ComposerClassifier compatible with TorchScript (#1036)
Fixed a bug where fractional Time Objects were being truncated instead of raising an exception (#1038)
Changed the defaults for Selective Backprop to not scale inputs, so the algorithm can work with non-vision workloads (#896)

New Contributors

@ofirpress made their first contribution in https://github.com/mosaicml/composer/pull/955
@QiyaoWei made their first contribution in https://github.com/mosaicml/composer/pull/866
@pavithranrao made their first contribution in https://github.com/mosaicml/composer/pull/879

Changelog

https://github.com/mosaicml/composer/compare/v0.6.1...v0.7.0

composer - v0.6.1

Published by ravi-mosaicml over 2 years ago

🚀 Composer v0.6.1

Composer v0.6.1 is released!

Go ahead and upgrade; it's fully backwards compatible with Composer v0.6.0.

Install via pip:

pip install --upgrade mosaicml==0.6.1

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.6.1

What's New?

📎 Adaptive Gradient Clipping (AGC)

Adaptive Gradient Clipping (AGC) clips gradients based on the ratio of their norms with weights' norms. This technique helps stabilize training with large batch sizes, especially for models without batchnorm layers.
🚚 Exponential Moving Average (EMA)

Exponential Moving Average (EMA) is a model averaging technique that maintains an exponentially weighted moving average of the model parameters during training. The averaged parameters are used for model evaluation. EMA typically results in less noisy validation metrics over the course of training, and sometimes increased generalization.

🪵 Logger is available in the ComposerModel

The Logger is bound to the ComposerModel via the self.logger attribute. It is available during training on all methods (other than __init__).

For example, to log hidden activation:

class Net(ComposerModel):

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        if self.logger:
            self.logger.data_batch({
                "hidden_activation_norm": x.norm(2).item(),
            })
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x)

🐛 Environment Collection Script

Composer v0.6.1 includes an environment collection script which generates a printout of your system configuration and python environment. If you run into a bug, the results from this script will help us debug the issue and fix Composer.

To collect your environment information:
```
$ pip install mosaicml  # if composer is not already installed
$ composer_collect_env
```
Then, include the output in your GitHub Issue.

What's Improved?

📜 TorchScriptable Algorithms

BlurPool, Ghost BatchNorm, and Stochastic Depth are now TorchScript-compatible. Try exporting your models with these algorithms enabled!
🏛️ ColOut on Segmentation

ColOut now supports segmentation-style models.

What's Fixed?

🚑️ Loggers capture the Traceback

We fixed a bug so the Loggers, such as the Weights & Biases Logger and the File Logger, will capture the traceback any exception that crashes the training process.
🏋️ Weights & Biases Logger Config

We fixed a bug where the the Weights & Biases Logger was not properly recording the configuration.

Full Changelog

https://github.com/mosaicml/composer/compare/v0.6.0...v0.6.1

composer - v0.6.0

Published by ravi-mosaicml over 2 years ago

🚀 Composer v0.6.0

Composer v0.6.0 is released! Install via pip:

pip install --upgrade mosaicml==0.6.0

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.6.0

Major Changes

🗃️ Automatic Gradient Accumulation

Composer v0.6.0 can automatically pick an appropriate value for gradient accumulation. The trainer will automatically catch
OutOfMemory exceptions and handle them gracefully. No need to manually tune this parameter for each model, batch size, and
hardware combination!

To use automatic gradient accumulation, set grad_accum='auto'. For example:
```
trainer = Trainer(
    ...,
    grad_accum='auto',
)
```
💾 Artifact Logging

Training on spot instances? Composer v0.6.0 introduces artifact logging, making it possible to store checkpoints and other artifacts directly to cloud storage. See the Object Store Logger and the Checkpointing Guide for more information.

Artifact Logging has replaced the run directory and the run directory uploader, which have been removed.
📊 Metric Values on the State

Composer v0.6.0 binds the computed metric values on the State. Go ahead and read these values from your own callbacks! We'll be releasing an early stopping callback in an upcoming Composer release.
⚠️ NoEffectWarning and NotIntendedUseWarning for Algorithms

Some algorithms, such as BlurPool, now emit a NoEffectWarning or a NotIntendedUseWarning when they're not being used appropriately.

Minor Improvements

🏃‍♀️ Training Run Names

We introduced a run_name parameter in the Trainer to help organize training runs.
```
trainer = Trainer(
    ...,
    run_name='awesome-traing-run',
)
```
We'll automatically pick one if the run name is not specified.
💈 Automatic Progress Bars

The ProgressBarLogger, formally called the TQDMLogger, is automatically enabled for all training runs.

To disable the progress bar, set progress_bar=False. For example:
```
trainer = Trainer(
    ...,
    progress_bar=False,
)
```
🪵 Logged Data in the Console

To print Logger calls to the console, set the log_to_console and the console_log_level arguments.
```
trainer = Trainer(
    ...,
    log_to_console=True,
    console_log_level="epoch",
)
```
By default, the console logger will only be enabled when progress_bar=False. The default console log level is epoch.
📃 Capturing stdout and stderr in Log Files

The FileLogger captures stdout and stderr by default now. Tracebacks will now be captured amongst other logging statements.
⬆️ PyTorch 1.11 Support

We've tested Composer on PyTorch 1.11. Go ahead and upgrade your dependencies!
✅ Checkpointing

We changed the checkpoint format to store the underlying model, not the DistributedDataParallel wrapped model. If you're using Composer to read checkpoints, there's nothing to change. But if you're reading Composer checkpoints manually, note that the module checkpoints will be formatted differently.

In addition, we changed the checkpointing argument names for the trainer.
- The new parameters save_artifact_name and save_latest_artifact_name allow checkpoints to be saved directly to artifact stores.
- The new parameter save_num_checkpoints_to_keep helps preserve local disk storage by automatically removing old checkpoints.
- load_path replaces load_path_format.
- save_name replaces save_path_format.
- save_latest_filename replaces save_latest_format.
🏎️ Profiling

We added support for custom scheduling functions and re-designed how the profiler saves traces. Each profiling cycle will now have its own trace file. Trace merging happens automatically throughout the training process. Long-running profiling is now possible without the long wait at the end of training for the trace merge.

As part of this refactor, the profiler arguments have changed:
- prof_trace_handlers replaces prof_event_handlers.
- prof_schedule replaces prof_skip_first, prof_wait, prof_warmup, prof_active, and prof_repeat. See the cyclic schedule function.
- torch_prof_folder replaces torch_profiler_trace_dir
- The new arguments torch_prof_filename, torch_prof_artifact_name, torch_prof_overwrite, and torch_prof_num_traces_to_keep allow for customization on how PyTorch Profiler traces are saved.
🏗️ TorchVision Model Architectures

We switched our vision models to use the TorchVision model architecture implementations where possible.

Bug Fixes

Fixed a bug with MixUp and gradient accumulation
Fixed numerous issues with the Composer launch script for distributed training. Composer v0.6.0 includes environment variable support, better defaults and warings, and proper handling of crashed processes.

Changelog

Update Migrating_from_PTL.ipynb by @moinnadeem in https://github.com/mosaicml/composer/pull/730
CodeQL Analysis by @Averylamp in https://github.com/mosaicml/composer/pull/723
Installing pyright via npm by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/735
Polish intro docs by @dblalock in https://github.com/mosaicml/composer/pull/721
Numerics docs page by @bandish-shah in https://github.com/mosaicml/composer/pull/725
Testing Niklas GH Docs Star w/ Dark Mode by @moinnadeem in https://github.com/mosaicml/composer/pull/742
[Artifact Logging PR1] Logger Refactoring by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/698
Update README.md by @moinnadeem in https://github.com/mosaicml/composer/pull/731
Updated the Method Cards by @hanlint in https://github.com/mosaicml/composer/pull/647
Using existing clone in conda meta.yaml by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/751
[Artifact Logging PR2] Logger Destination Cleanup by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/699
Shorten to minimal code snippets by @hanlint in https://github.com/mosaicml/composer/pull/752
Sample-wise Stochastic Depth Method Card by @Landanjs in https://github.com/mosaicml/composer/pull/749
Update algorithm yamls by @coryMosaicML in https://github.com/mosaicml/composer/pull/747
[Artifact Logging PR3] Add the run_name as a property of the Logger by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/700
[Artifact Logging PR4] Added log_file_artifact base method by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/701
Fix README.md by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/753
Less CodeQL by @Averylamp in https://github.com/mosaicml/composer/pull/762
Increase the timeout for test trainer equivalence by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/766
Port squeze excite method card to new format by @dblalock in https://github.com/mosaicml/composer/pull/764
Small fixes by @hanlint in https://github.com/mosaicml/composer/pull/765
Adding defaults to blurpool by @moinnadeem in https://github.com/mosaicml/composer/pull/756
Added maximum versions to dependencies by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/768
Update sequence length warmup documentation by @moinnadeem in https://github.com/mosaicml/composer/pull/770
Additional README fixes by @hanlint in https://github.com/mosaicml/composer/pull/769
Fix setup.py by @Averylamp in https://github.com/mosaicml/composer/pull/761
Increased the timeout for test_trainer.py by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/775
Remove plural types and aliases for native pytorch types by @Landanjs in https://github.com/mosaicml/composer/pull/677
[Artifact Logging PR5] Added the object store logger by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/706
[Artifact Logging PR6] Rename the TQDMLogger as the ProgressBarLogger; remove terminal logging from the file logger by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/708
[Artifact Logging PR7] Add stdout and stderr capture to the FileLogger by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/710
Update README.md by @vahidfazelrezai in https://github.com/mosaicml/composer/pull/781
URGENT: Fixing an incorrect number by @jfrankle in https://github.com/mosaicml/composer/pull/785
Add eval dataloader to the README.md by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/779
Readme code fix by @nqn in https://github.com/mosaicml/composer/pull/787
Set the random seed before each test. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/786
Docker file for vision applications with ffcv and deeplabv3 dependencies by @dskhudia in https://github.com/mosaicml/composer/pull/724
Update README.md by @murthyn in https://github.com/mosaicml/composer/pull/789
Chmod 644 all files by @Averylamp in https://github.com/mosaicml/composer/pull/760
Add Algorithm Warning for NoEffectWarning by @hanlint in https://github.com/mosaicml/composer/pull/720
Update dense label conversion and soft cross entropy to handle segmentation style labels by @coryMosaicML in https://github.com/mosaicml/composer/pull/763
added model card details comparing cifar to imagenet resnets by @growlix in https://github.com/mosaicml/composer/pull/792
Added codeowners file by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/797
ffcv integration for cifar10 dataset by @dskhudia in https://github.com/mosaicml/composer/pull/672
Add trainer link to README by @hanlint in https://github.com/mosaicml/composer/pull/804
ffcv integration for imagenet by @dskhudia in https://github.com/mosaicml/composer/pull/802
[XS] Consolidating NLP Import Message by @moinnadeem in https://github.com/mosaicml/composer/pull/795
Removed duplicate logger registry by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/808
Update docs on random seed by @hanlint in https://github.com/mosaicml/composer/pull/794
Remove the LoggerData and LoggerDataDict types by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/810
Rename composer/datasets/webdataset.py => composer/datasets/webdataset_utils.py by @dskhudia in https://github.com/mosaicml/composer/pull/813
More method card updates by @jfrankle in https://github.com/mosaicml/composer/pull/777
[Part 1] Adding Synthetic NLP Tokenizers, Models, Datasets w/o Integration by @moinnadeem in https://github.com/mosaicml/composer/pull/650
Update README by @moinnadeem in https://github.com/mosaicml/composer/pull/822
Updating setup.py with missing dependancies by @dlmgary in https://github.com/mosaicml/composer/pull/818
Fix submodule type errors when doing import composer by @dblalock in https://github.com/mosaicml/composer/pull/823
Update composer_model.rst by @moinnadeem in https://github.com/mosaicml/composer/pull/824
models cleanup - part 3: one model family per directory (cifar resnets) by @A-Jacobson in https://github.com/mosaicml/composer/pull/791
Support for webdatasets with ffcv by @dskhudia in https://github.com/mosaicml/composer/pull/815
Remove config from the logger base classes by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/811
models cleanup - part 2: metrics and loss by @A-Jacobson in https://github.com/mosaicml/composer/pull/790
Adding docstring for missing conditional imports by @moinnadeem in https://github.com/mosaicml/composer/pull/836
Filepath formatting helper utilities by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/827
Serialize model state without module. prefix when using DDP by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/829
models cleanup - part 1: composermodel tasks by @A-Jacobson in https://github.com/mosaicml/composer/pull/788
Remove Batch Types - Part 1: recursive to_device function by @A-Jacobson in https://github.com/mosaicml/composer/pull/727
Profiler Refactor for Artifact Logging by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/828
[Artifact Logging PR8]: Switch to artifact logging and remove the run directory. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/712
conditional imports use MissingConditionalImportError #814 by @IanWorley in https://github.com/mosaicml/composer/pull/835
Vision Tests + Jenkins Improvements by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/806
Fix the entrypoint and launch script by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/840
Remove a broken link to an old callback hparams tutorial. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/850
Remove no longer needed xfails by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/848
Ade20k streaming dataset yaml by @Landanjs in https://github.com/mosaicml/composer/pull/843
[Part 2] Integrating synthetic tokenizers, datasets, and models into our unit tests by @moinnadeem in https://github.com/mosaicml/composer/pull/652
'Second' typo by @nqn in https://github.com/mosaicml/composer/pull/852
[FFCV] webdataset from local + download only once by @dskhudia in https://github.com/mosaicml/composer/pull/849
Lowered Test Timeouts by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/851
Proofreading for docs "Getting Started" section by @mcneela in https://github.com/mosaicml/composer/pull/859
Dynamic Shrinking Microbatches by @mvpatel2000 in https://github.com/mosaicml/composer/pull/485
Proofreading for speedup methods section by @mcneela in https://github.com/mosaicml/composer/pull/861
LICENSE: copyright and cleanup by @kobindra in https://github.com/mosaicml/composer/pull/862
CLI Launcher supports environment variables and tells fewer lies by @jbloxham in https://github.com/mosaicml/composer/pull/860
Update MixUp to allow use of index labels by @coryMosaicML in https://github.com/mosaicml/composer/pull/825
Bert validation refactor by @anisehsani in https://github.com/mosaicml/composer/pull/478
Make wandb tags optional by @siriuslee in https://github.com/mosaicml/composer/pull/865
Fix validation in CLI launcher by @jbloxham in https://github.com/mosaicml/composer/pull/870
Fixing version number by @ajaysaini725 in https://github.com/mosaicml/composer/pull/871
PyTorch 1.11 Docker Image by @bandish-shah in https://github.com/mosaicml/composer/pull/868
Add missing ffcv dependency in pytorch_vision docker image by @dskhudia in https://github.com/mosaicml/composer/pull/867
Fixed webdatasest import bug by @ajaysaini725 in https://github.com/mosaicml/composer/pull/874
Proofread five sections of Trainer module docs by @mcneela in https://github.com/mosaicml/composer/pull/872
Switch mixup events to avoid grad accum issues by @coryMosaicML in https://github.com/mosaicml/composer/pull/875
Proofreading docs through "Callbacks" section by @mcneela in https://github.com/mosaicml/composer/pull/878
Initialize distributed before dataloaders are created by @dskhudia in https://github.com/mosaicml/composer/pull/869
Proofreading the remainder of the trainer section of docs by @mcneela in https://github.com/mosaicml/composer/pull/881
Add test for grad_accum > 2 to the asset tests by @hanlint in https://github.com/mosaicml/composer/pull/876
Remove Batch Types - Part 2: unify split batch by @A-Jacobson in https://github.com/mosaicml/composer/pull/833
Proofreading Methods section of docs through AugMix by @mcneela in https://github.com/mosaicml/composer/pull/883
Add ssh by @Averylamp in https://github.com/mosaicml/composer/pull/885
rename LICENSE_HEADER to fix GH license detection by @kobindra in https://github.com/mosaicml/composer/pull/863
Torch 1.11 pytorch_vision Docker image by @bandish-shah in https://github.com/mosaicml/composer/pull/886
Add full traceback to grad accum errors by @mvpatel2000 in https://github.com/mosaicml/composer/pull/892
Modify ResNet9 benchmark to enable channels_last and progressive_resizing by @coryMosaicML in https://github.com/mosaicml/composer/pull/889
Proofreading Methods section of docs through Cutout by @mcneela in https://github.com/mosaicml/composer/pull/890
Proofread Methods section of docs through MixUp by @mcneela in https://github.com/mosaicml/composer/pull/895
Fixes for ffcv integration by @dskhudia in https://github.com/mosaicml/composer/pull/844
Print the stdout/stderr of the crashing process by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/893
Change NLP yamls to use evaluators by @anisehsani in https://github.com/mosaicml/composer/pull/891
Fix loss logging with DeepSpeed by @abhi-mosaic in https://github.com/mosaicml/composer/pull/897
Add Computed Metrics to State by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/842
Proofread Methods section of docs through Squeeze-Excite by @mcneela in https://github.com/mosaicml/composer/pull/899
test whether resuming from a checkpoint changes algorithm effect by @growlix in https://github.com/mosaicml/composer/pull/816
Object store symlinks for graceful resumption by @mvpatel2000 in https://github.com/mosaicml/composer/pull/887
Console log level by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/900
Remove asdict in unet by @Landanjs in https://github.com/mosaicml/composer/pull/901
Cherry Pick #906 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/912
Release/v0.6.0 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/933

New Contributors

@vahidfazelrezai made their first contribution in https://github.com/mosaicml/composer/pull/781
@murthyn made their first contribution in https://github.com/mosaicml/composer/pull/789
@dlmgary made their first contribution in https://github.com/mosaicml/composer/pull/818
@IanWorley made their first contribution in https://github.com/mosaicml/composer/pull/835

Full Changelog: https://github.com/mosaicml/composer/compare/v0.5.0...v0.6.0

composer - Release version v0.5.0

Published by hanlint over 2 years ago

We are excited to share Composer v0.5, a library of speed-up methods for efficient neural network training. This release features:

Revamped checkpointing API based on community feedback
New baselines: ResNet34-SSD, GPT-3, and Vision Transformers
Additional improvements to our documentation
Support for bfloat16
Streaming dataset support
Unified functional API for our algorithms

Highlights

Checkpointing API

Checkpointing models are now a Callback, so that users can easily write and add their own callbacks. The callback is automatically appended if a save_folder is provided to the Trainer.

trainer = Trainer(
    model=model,
    algorithms=algorithms,
    save_folder="checkpoints",
    save_interval="1ep"
)

Alternatively, CheckpointSaver can be directly added as a callback:

trainer = Trainer(..., callbacks=[
    CheckpointSaver(
        save_folder='checkpoints',
        name_format="ep{epoch}-ba{batch}/rank_{rank}",
        save_latest_format="latest/rank_{rank}",
        save_interval="1ep",
        weights_only=False,
    )
])

Subclass from CheckpointSaver to add your own logic for saving the best model, or saving at specific intervals. Thanks to @mansheej @siriuslee and other users for their feedback.

bloat16

We've added experimental support for bfloat16, which can be provided via the precision argument to the Trainer:

trainer = Trainer(
    ...,
    precision="bfloat16"
)

Streaming datasets

We've added support for fast streaming datasets. For NLP-based datasets such as C4, we use the HuggingFace datasets backend, and add dataset-specific shuffling, tokenization , and grouping on-the-fly. To support data parallel training, we added specific sharding logic for efficiency. See C4Datasets for more details.

Vision streaming datasets are supported via a patched version of the webdatasets package, and added support for data sharding by workers for fast augmentations. See composer.datasets.webdataset for more details.

Baseline GPT-3, ResNet34-SSD, and Vision Transformer benchmarks

Configurations for GPT-3-like models ranging from 125m to 760m parameters are now released, and use DeepSpeed Zero Stage 0 for memory-efficient training.

We've also added the Single Shot Detection (SSD) model (Wei et al, 2016) with a ResNet34 backbone, based on the MLPerf reference implementation.

Our first Vision Transformer benchmark is the ViT-S/16 model from Touvron et al, 2021, and based on the vit-pytorch package.

See below for the full details:

What's Changed

Export Transforms in composer.algorithms by @ajaysaini725 in https://github.com/mosaicml/composer/pull/603
Make batchnorm default for UNet by @dskhudia in https://github.com/mosaicml/composer/pull/535
Fix no_op_model algorithm by @dskhudia in https://github.com/mosaicml/composer/pull/614
Pin pre-1.0 packages by @bandish-shah in https://github.com/mosaicml/composer/pull/595
Updated dark mode composer logo, and graph by @nqn in https://github.com/mosaicml/composer/pull/617
Jenkins + Docker Improvements by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/621
update README links by @hanlint in https://github.com/mosaicml/composer/pull/628
Remove all old timing calls by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/594
Remove state shorthand by @mvpatel2000 in https://github.com/mosaicml/composer/pull/629
add bfloat16 support by @nikhilsardana in https://github.com/mosaicml/composer/pull/433
v0.4.0 Hotfix: Docker documentation updates by @bandish-shah in https://github.com/mosaicml/composer/pull/631
Fix wrong icons in the method cards by @hanlint in https://github.com/mosaicml/composer/pull/636
fix autocast for pytorch < 1.10 by @nikhilsardana in https://github.com/mosaicml/composer/pull/639
Add tutorial notebooks to the README by @moinnadeem in https://github.com/mosaicml/composer/pull/630
Converted Stateless Schedulers to Classes by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/632
Jenkinsfile Fixes Part 2 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/627
Add C4 Streaming dataset by @abhi-mosaic in https://github.com/mosaicml/composer/pull/489
CONTRIBUTING.md additions by @kobindra in https://github.com/mosaicml/composer/pull/648
Hide showing object as a base class; fix skipping documentation of forward; fixed docutils dependency. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/643
Matthew/functional docstrings update by @growlix in https://github.com/mosaicml/composer/pull/622
docstrings improvements for core modules by @dskhudia in https://github.com/mosaicml/composer/pull/598
ssd-resnet34 on COCO map 0.23 by @florescl in https://github.com/mosaicml/composer/pull/646
Fix broken "best practices" link by @growlix in https://github.com/mosaicml/composer/pull/649
Update progressive resizing to work for semantic segmentation by @coryMosaicML in https://github.com/mosaicml/composer/pull/604
Let C4 Dataset overwrite num_workers if set incorrectly by @abhi-mosaic in https://github.com/mosaicml/composer/pull/655
Lazy imports for pycocotools by @abhi-mosaic in https://github.com/mosaicml/composer/pull/656
W&B excludes final eval metrics when plotted as a fxn of epoch or trainer/global_step by @growlix in https://github.com/mosaicml/composer/pull/633
Update GPT3-yamls for default 8xA100-40GB by @abhi-mosaic in https://github.com/mosaicml/composer/pull/663
Set WandB default to log rank zero only by @abhi-mosaic in https://github.com/mosaicml/composer/pull/461
Update schedulers guide by @hanlint in https://github.com/mosaicml/composer/pull/661
[XS] Fix a TQDM deserialization bug by @jbloxham in https://github.com/mosaicml/composer/pull/665
Add defaults to the docstrings for algorithms by @hanlint in https://github.com/mosaicml/composer/pull/662
Fix ZeRO config by @jbloxham in https://github.com/mosaicml/composer/pull/667
[XS] fix formatting for colout by @hanlint in https://github.com/mosaicml/composer/pull/666
Composer.core docstring touch-up by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/657
Add Uniform bounding box sampling option for CutOut and CutMix by @coryMosaicML in https://github.com/mosaicml/composer/pull/634
Update README.md by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/678
Fix bug in trainer test by @hanlint in https://github.com/mosaicml/composer/pull/651
InMemoryLogger has get_timeseries() method by @growlix in https://github.com/mosaicml/composer/pull/644
Batchwise resolution for SWA by @growlix in https://github.com/mosaicml/composer/pull/654
Fixed the conda build script so it runs on jenkins by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/676
Yahp version update to 0.1.0 by @Averylamp in https://github.com/mosaicml/composer/pull/674
Streaming vision datasets by @knighton in https://github.com/mosaicml/composer/pull/284
Fix DeepSpeed checkpointing by @jbloxham in https://github.com/mosaicml/composer/pull/686
Vit by @A-Jacobson in https://github.com/mosaicml/composer/pull/243
[S] cleanup tldr; standardize __all__ by @hanlint in https://github.com/mosaicml/composer/pull/688
Unify algorithms part 2: mixup, cutmix, label smoothing by @dblalock in https://github.com/mosaicml/composer/pull/658
composer.optim docstrings by @jbloxham in https://github.com/mosaicml/composer/pull/653
Fix DatasetHparams, WebDatasetHparams docstring by @growlix in https://github.com/mosaicml/composer/pull/697
Models docstrings by @A-Jacobson in https://github.com/mosaicml/composer/pull/469
docstrings improvements for composer.datasets by @dskhudia in https://github.com/mosaicml/composer/pull/694
Updated contributing.md and the style guide by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/670
Ability to retry ADE20k crop transform by @Landanjs in https://github.com/mosaicml/composer/pull/702
Add mmsegmentation DeepLabv3(+) by @Landanjs in https://github.com/mosaicml/composer/pull/684
Unify functional API part 3 by @dblalock in https://github.com/mosaicml/composer/pull/715
Update example notebooks by @coryMosaicML in https://github.com/mosaicml/composer/pull/707
[Checkpointing - PR1] Store the rank_zero_seed on state by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/680
[Checkpointing - PR2] Added in new Checkpointing Events by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/690
[Checkpointing - PR3] Clean up RNG and State serialization by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/692
[Checkpointing - PR4] Refactored the CheckpointLoader into a load_checkpoint function by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/693
Update {blurpool,factorize,ghostbn} method cards by @dblalock in https://github.com/mosaicml/composer/pull/711
[Checkpointing - PR 5] Move the CheckpointSaver to a callback. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/687
Update datasets docstrings by @growlix in https://github.com/mosaicml/composer/pull/709
add notebooks and functional api by @hanlint in https://github.com/mosaicml/composer/pull/714
Migrating from PTL notebook by @florescl in https://github.com/mosaicml/composer/pull/436
Docs 0.4.1: Profiler section and tutorials by @bandish-shah in https://github.com/mosaicml/composer/pull/696
Improve datasets docstrings by @knighton in https://github.com/mosaicml/composer/pull/695
Update C4Dataset to repeat, handle max_samples safely by @abhi-mosaic in https://github.com/mosaicml/composer/pull/722
Fix docs build by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/773
v0.5 Release by @hanlint in https://github.com/mosaicml/composer/pull/732

New Contributors

@nikhilsardana made their first contribution in https://github.com/mosaicml/composer/pull/433
@knighton made their first contribution in https://github.com/mosaicml/composer/pull/284

Full Changelog: https://github.com/mosaicml/composer/compare/v0.4.0...v0.5.0

composer - Release Version 0.4.0

Published by hanlint over 2 years ago

What's Changed

Release/0.3.0 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/102
Create dataloader on trainer init() by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/92
label smoothing will not work without alpha set by @A-Jacobson in https://github.com/mosaicml/composer/pull/100
Warmup and cosine annealing warm restarts combine sequentially by @jacobfulano in https://github.com/mosaicml/composer/pull/99
Moved device.prepare() to init by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/111
run_event for callbacks, removed deferred logging by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/85
Remove composer.trainer.ddp; replace with composer.utils.ddp by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/105
Running callbacks befor algorithms for the INIT event in the engine by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/113
Replaced atexit with cleanup methods by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/112
Deepspeed Integration by @jbloxham in https://github.com/mosaicml/composer/pull/109
Fix loss reporting by @jbloxham in https://github.com/mosaicml/composer/pull/130
Run Directory Uploader by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/101
Dataloader Upgrades by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/114
Synthetic Datasets and Subset Sampling by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/110
Remove argparse from setup.py by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/131
Fixed pickling of torch.memory_format objects by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/132
Fixed issue #135; rename total_batch_size to train_batch_size by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/137
Implement MosaicMLLoggerBackend by @ajaysaini725 in https://github.com/mosaicml/composer/pull/81
Add a linear learning rate decay by @moinnadeem in https://github.com/mosaicml/composer/pull/142
Apply channels last on init by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/147
Update Trainer checkpointing documentation by @moinnadeem in https://github.com/mosaicml/composer/pull/150
Address crashes with DDP + Checkpointing by @moinnadeem in https://github.com/mosaicml/composer/pull/151
Sudo in the dockerimage by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/152
Remove curriculum learning by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/164
Remove broken symlinks by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/163
Removed dataclass from state by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/153
Guard artifact uploading in wandb with ddp barriers by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/162
add CODE_OF_CONDUCT.md by @kobindra in https://github.com/mosaicml/composer/pull/160
[XS] Fix wandb logger by @jbloxham in https://github.com/mosaicml/composer/pull/172
Print help on run_mosaic_trainer.py, cleaned up verbosity. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/170
DeepSpeed ZeRO config options by @jbloxham in https://github.com/mosaicml/composer/pull/166
DDP Seeding Across Processes by @ajaysaini725 in https://github.com/mosaicml/composer/pull/173
Fixed the run directory uploader test by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/177
Fix broken gpu tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/181
Conditionally skip tests when installed with mosaicml[dev] by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/185
A yapf update broke some formatting...re-running the linter by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/188
Timer PR parts 1 and 2 from #146 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/174
Fixed pyright issues by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/198
Additional Tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/191
Propagate processes that were sigkilled by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/184
Add the ability to load a checkpoint without restoring state by @moinnadeem in https://github.com/mosaicml/composer/pull/169
Add ResNet-9 for CIFAR-10 by @dblalock in https://github.com/mosaicml/composer/pull/193
Added helper methods for torch.distributed.boradcast by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/189
Checkpointing & DeepSpeed by @jbloxham in https://github.com/mosaicml/composer/pull/199
Distinguish between dist and DDP by @jbloxham in https://github.com/mosaicml/composer/pull/201
DeepSpeed precision fixes for CV by @jbloxham in https://github.com/mosaicml/composer/pull/197
Fix deterministic mode (and use it for tests); simplify checkpointing tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/203
Load checkpoints from cloud storage by @ravirahman in https://github.com/mosaicml/composer/pull/200
Updated the DataSpec for the timing abstraction (#146) parts 3 and 4 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/178
Add larger GPT models by @jbloxham in https://github.com/mosaicml/composer/pull/213
Add BERT Base to Composer by @moinnadeem in https://github.com/mosaicml/composer/pull/195
Integrate the timer into the training loop by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/210
Dockerfile enhancements by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/182
Adding checkpointing at the end of training by @moinnadeem in https://github.com/mosaicml/composer/pull/219
Adding conditional branching on data_collator by @moinnadeem in https://github.com/mosaicml/composer/pull/220
Fixes apt sources bug fix by @Averylamp in https://github.com/mosaicml/composer/pull/231
Remove old timing calls from layer freezing by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/216
Require pip install -e be pip install --user -e when running as root by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/232
DeepLabv3 + ADE20k benchmark by @Landanjs in https://github.com/mosaicml/composer/pull/107
Remove old timing calls from selective backprop by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/221
Clean up the tests to make them work on jenkins by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/233
Make the run directory rank-local; fix checkpoints saving and restoring by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/215
Cleaned Up State by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/223
Fix the speed monitor by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/238
Fixed loggers and callbacks by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/240
Fix ade20k padding fill calculation by @Landanjs in https://github.com/mosaicml/composer/pull/250
Adding fix for NLP learning rates by @moinnadeem in https://github.com/mosaicml/composer/pull/235
Training Loop Profiler by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/97
WIP: Composer Jenkinsfile by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/82
Fix broken tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/257
Fix bug with AFTER_DATALOADER event; remove microbatches from state by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/258
Remove the DDP DataLoader by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/245
Fix Jenkins to work on PRs from Forks by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/267
add ability to specify custom run name, with rank auto-appended by @dblalock in https://github.com/mosaicml/composer/pull/264
Remove secrets from the yaml by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/261
Checkpoint logging and doc fixes by @ajaysaini725 in https://github.com/mosaicml/composer/pull/270
Remove custom W&B config changes by @siriuslee in https://github.com/mosaicml/composer/pull/236
Dramatically increase default dist_timeout by @jbloxham in https://github.com/mosaicml/composer/pull/272
Add factorization by @dblalock in https://github.com/mosaicml/composer/pull/53
Allow str and dict in Trainer init signature by @hanlint in https://github.com/mosaicml/composer/pull/277
Add kwargs back to the closure by @jbloxham in https://github.com/mosaicml/composer/pull/292
Default to num_classes=10 for CIFAR10_ResNet56 by @hanlint in https://github.com/mosaicml/composer/pull/293
Use tqdm.auto for notebooks by @hanlint in https://github.com/mosaicml/composer/pull/298
Added ResNet20 by @growlix in https://github.com/mosaicml/composer/pull/289
Optimizer Surgery by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/249
Don't init dist when world_size is 1 by @jbloxham in https://github.com/mosaicml/composer/pull/311
Scheduler defaults to step-wise instead of epoch-wise by @hanlint in https://github.com/mosaicml/composer/pull/312
Added the version to composer.init by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/315
Rename checkpoint API by @hanlint in https://github.com/mosaicml/composer/pull/281
Update setup.py by @Averylamp in https://github.com/mosaicml/composer/pull/321
Timm support by @A-Jacobson in https://github.com/mosaicml/composer/pull/262
[XS] use correct package name in error messages by @jbloxham in https://github.com/mosaicml/composer/pull/331
Multiple Evaluator Datasets by @anisehsani in https://github.com/mosaicml/composer/pull/120
Fixed all uses of textwrap.dedent by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/332
Remove explicit YAHP constructs from algorithms by @jbloxham in https://github.com/mosaicml/composer/pull/317
Configure DeepSpeed with an ordinary DeepSpeed config dict by @jbloxham in https://github.com/mosaicml/composer/pull/322
Run Event.BATCH_END and Event.EPOCH_END after the timer is increm… by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/310
Guard dist.barrier in the checkpointer with try/finally by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/334
Replace composer ResNet with torchvision ResNet by @Landanjs in https://github.com/mosaicml/composer/pull/314
Fail fast if any step fails by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/333
Replace most instances of "Mosaic" with "Composer" by @jbloxham in https://github.com/mosaicml/composer/pull/335
Ensure that the training dataloader does not have an active iterator. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/337
Fully flatten checkpoint params by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/325
Added Pylint and docformatter by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/339
Add compression flag by @mvpatel2000 in https://github.com/mosaicml/composer/pull/336
Fix cutmix and mixup reliance on num_classes model attribute by @Landanjs in https://github.com/mosaicml/composer/pull/348
Copy extra_init_params to get rid of recursive config dicts by @siriuslee in https://github.com/mosaicml/composer/pull/316
Composer Style Guide by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/319
Get rid of create_from_hparams by @jbloxham in https://github.com/mosaicml/composer/pull/351
Added In Memory Logger, Timestamp Object by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/352
Fix Checkpoints by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/359
Add channels last standalone function by @dblalock in https://github.com/mosaicml/composer/pull/356
Quick style guide typo fix by @ajaysaini725 in https://github.com/mosaicml/composer/pull/360
Removed template_default fields in hparams by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/369
removed byo_trainer by @anisehsani in https://github.com/mosaicml/composer/pull/374
Fix sample SD inference multiplication by @Landanjs in https://github.com/mosaicml/composer/pull/376
Support import composer.functional as cf by @dblalock in https://github.com/mosaicml/composer/pull/368
Fix composer.functional page no longer showing functions by @dblalock in https://github.com/mosaicml/composer/pull/379
Testing trainer.fit on each algorithm, callback, logger, and profiler by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/371
Functional API renaming part 1 by @dblalock in https://github.com/mosaicml/composer/pull/380
Updated add_dataset_transform() to have flexible insertion point by @growlix in https://github.com/mosaicml/composer/pull/320
Rename Event.TRAINING_START to Event.FIT; remove Event.TRAINING_END by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/263
Remove requirement for validation and metrics by @hanlint in https://github.com/mosaicml/composer/pull/378
Docs Refactor by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/386
Documentation Outline by @ajaysaini725 in https://github.com/mosaicml/composer/pull/302
Fix tests without DDP by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/389
Use Makefile instead of scripts; enable easier testing by @hanlint in https://github.com/mosaicml/composer/pull/387
Address Doc Fixes for Surgery and StochasticDepth by @ajaysaini725 in https://github.com/mosaicml/composer/pull/413
Cleanup conftest.py by @hanlint in https://github.com/mosaicml/composer/pull/390
Move world_size guard to trainer by @hanlint in https://github.com/mosaicml/composer/pull/392
Add defaults to functional API / share defaults across interfaces by @dblalock in https://github.com/mosaicml/composer/pull/377
Un-deprecate steps_per_epoch by @jbloxham in https://github.com/mosaicml/composer/pull/418
Remove the walkthrough section of the docs; replace with module-level docstrings by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/417
Rename Loggers by @hanlint in https://github.com/mosaicml/composer/pull/427
Alternative docs theme: furo by @nqn in https://github.com/mosaicml/composer/pull/341
Clarify DWD defaults by @abhi-mosaic in https://github.com/mosaicml/composer/pull/410
Added :ignore-module-all: to docs by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/431
Configured doctest by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/432
Functional API renaming part 2 by @dblalock in https://github.com/mosaicml/composer/pull/426
Pytest Refactor Part 1 by @hanlint in https://github.com/mosaicml/composer/pull/391
Deprecate scale scheduler algorithm and move to trainer by @jbloxham in https://github.com/mosaicml/composer/pull/438
Removed dead code from the public library; refactored some imports. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/437
Trainer test refactor (pytest refactor phase 2) by @hanlint in https://github.com/mosaicml/composer/pull/393
Skip saving of direct serialization fields by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/445
Hide gen_interpolation_lambda in mixup like in cutmix and augmix by @dblalock in https://github.com/mosaicml/composer/pull/449
Move all AlgorithmHparams classes to shared file by @dblalock in https://github.com/mosaicml/composer/pull/452
Trainer Docs + Param ordering + Alibi Export by @ajaysaini725 in https://github.com/mosaicml/composer/pull/419
Up and Running with Composer and Speedup Algorithms Demo Notebook by @growlix in https://github.com/mosaicml/composer/pull/340
Add NLP tutorial notebook by @Landanjs in https://github.com/mosaicml/composer/pull/370
add kaggle notebook by @A-Jacobson in https://github.com/mosaicml/composer/pull/381
Refactor Profiler init() by @bandish-shah in https://github.com/mosaicml/composer/pull/422
Random doc fixes by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/456
support integer arguments to Trainer by @hanlint in https://github.com/mosaicml/composer/pull/458
Make algorithm functions either public or prefixed with "_" by @dblalock in https://github.com/mosaicml/composer/pull/460
bug in train metrics by @A-Jacobson in https://github.com/mosaicml/composer/pull/466
Fixes empty log lines if no algorithms are run by @siriuslee in https://github.com/mosaicml/composer/pull/462
Add default hparam values for cutout by @dblalock in https://github.com/mosaicml/composer/pull/459
Docstrings for composer.utils by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/439
notebook tests by @hanlint in https://github.com/mosaicml/composer/pull/468
resize_targets set to False by default by @siriuslee in https://github.com/mosaicml/composer/pull/475
Remove dist warnings by @hanlint in https://github.com/mosaicml/composer/pull/474
Add missing defaults for one function by @dblalock in https://github.com/mosaicml/composer/pull/476
Store metadata in json files for algorithms by @hanlint in https://github.com/mosaicml/composer/pull/471
Davis/algos intrafile organization by @dblalock in https://github.com/mosaicml/composer/pull/465
Get functional API running enough for notebook by @dblalock in https://github.com/mosaicml/composer/pull/479
Remove colons from run directory timestamps by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/486
Add custom methods notebook by @coryMosaicML in https://github.com/mosaicml/composer/pull/330
Move the clean notebooks script to the scripts folder by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/487
Checkpoint Usability Initial Changes by @ajaysaini725 in https://github.com/mosaicml/composer/pull/455
Removing HF XFail on model registry by @moinnadeem in https://github.com/mosaicml/composer/pull/490
Clean up Imports and Tests by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/482
Ravi/docs cleanup 2 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/488
Matthew/docstrings update by @growlix in https://github.com/mosaicml/composer/pull/457
No autodoc of forward by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/494
Update init.py by @growlix in https://github.com/mosaicml/composer/pull/493
allow from composer import ComposerModel by @hanlint in https://github.com/mosaicml/composer/pull/496
Methods landing page by @nqn in https://github.com/mosaicml/composer/pull/454
Small docs change to include timing reference by @anisehsani in https://github.com/mosaicml/composer/pull/500
docstring for callbacks by @dskhudia in https://github.com/mosaicml/composer/pull/470
Docs cleanup #3 by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/502
Adding network fixes for the Run Directory Uploader by @moinnadeem in https://github.com/mosaicml/composer/pull/505
Adding network retries for downloading GLUE by @moinnadeem in https://github.com/mosaicml/composer/pull/506
Matthew/loggers docstrings by @growlix in https://github.com/mosaicml/composer/pull/499
Fix Sphinx Warnings by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/520
Anaconda configuration by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/507
Update docstrings for Colout, CutOut, CutMix, Layer Freezing, Mixup, Label Smoothing, Progressive Resizing by @coryMosaicML in https://github.com/mosaicml/composer/pull/483
Stateless schedulers by @jbloxham in https://github.com/mosaicml/composer/pull/463
Rename selective_backprop to select_using_loss by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/532
Update new README by @hanlint in https://github.com/mosaicml/composer/pull/540
Fix dark mode by @nqn in https://github.com/mosaicml/composer/pull/573
Fix the run directory uploader when use_procs=True and not using the … by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/547
Console font too bright by @nqn in https://github.com/mosaicml/composer/pull/574
Fix pil_image_collate by @Landanjs in https://github.com/mosaicml/composer/pull/514
ADE20k DeepLabv3 optimized benchmark yaml by @Landanjs in https://github.com/mosaicml/composer/pull/579
separate hparams in module docstrings by @hanlint in https://github.com/mosaicml/composer/pull/558
Fix DataloaderHparam docs by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/534
per #224, update function to use Timer and Time by @jzf2101 in https://github.com/mosaicml/composer/pull/583
Clean up Transformer models init function by @moinnadeem in https://github.com/mosaicml/composer/pull/587
Docstrings for composer.trainer by @ajaysaini725 in https://github.com/mosaicml/composer/pull/522
Additional updates to the loggers docstrings by @growlix in https://github.com/mosaicml/composer/pull/544
Profiler docstrings by @bandish-shah in https://github.com/mosaicml/composer/pull/473
Updated Model Cards by @ajaysaini725 in https://github.com/mosaicml/composer/pull/375
Unify augmentation API part 1 by @dblalock in https://github.com/mosaicml/composer/pull/524
Docstrings improvements for core.algorithm, core.callback, etc. by @dskhudia in https://github.com/mosaicml/composer/pull/516
Skip ResNet50 + DeepSpeed tests that are timing out by @hanlint in https://github.com/mosaicml/composer/pull/601
Make the default split_batch method a no-op if grad_accum is 1. by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/592
Add functional/standalone API tutorial notebook by @dblalock in https://github.com/mosaicml/composer/pull/326
Merge v0.4 fixes by @hanlint in https://github.com/mosaicml/composer/pull/606
updated docstring examples by @growlix in https://github.com/mosaicml/composer/pull/600
[v0.4rc] Documentation Guides by @hanlint in https://github.com/mosaicml/composer/pull/531
Method cards by @jfrankle in https://github.com/mosaicml/composer/pull/589
Improved docstring for surgery algorithms by @dblalock in https://github.com/mosaicml/composer/pull/602
Fix Lint by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/611
Fix Lint by @ravi-mosaicml in https://github.com/mosaicml/composer/pull/612
Updated 'Up and Running with Composer' by @growlix in https://github.com/mosaicml/composer/pull/619
Release v0.4.0 by @hanlint in https://github.com/mosaicml/composer/pull/609

New Contributors

@A-Jacobson made their first contribution in https://github.com/mosaicml/composer/pull/100
@jacobfulano made their first contribution in https://github.com/mosaicml/composer/pull/99
@kobindra made their first contribution in https://github.com/mosaicml/composer/pull/160
@ravirahman made their first contribution in https://github.com/mosaicml/composer/pull/200
@Landanjs made their first contribution in https://github.com/mosaicml/composer/pull/107
@siriuslee made their first contribution in https://github.com/mosaicml/composer/pull/236
@mvpatel2000 made their first contribution in https://github.com/mosaicml/composer/pull/336
@abhi-mosaic made their first contribution in https://github.com/mosaicml/composer/pull/410
@jzf2101 made their first contribution in https://github.com/mosaicml/composer/pull/583
@jfrankle made their first contribution in https://github.com/mosaicml/composer/pull/589

Full Changelog: https://github.com/mosaicml/composer/compare/v0.3.1...v0.4.0

composer - Release Version 0.3.1

Published by Averylamp almost 3 years ago

Hotfix

Hotfix to fix installation of the composer package

composer - Release Version 0.3.0

Published by Averylamp almost 3 years ago

Release PR

Major Changes

Python 3.7 Compatibility
Adds CutMix Method
New Pre-Fork DDP entrypoint
- Change PR
- composer Entrypoint for DDP forking prior to script start
- Documentation on Usage

Minor Changes

Lazy-Loading of dependencies
General Docs updates for readability and correctness
DDP Port auto-selection by default (no more conflicting ports upon reuse of trainer)
Small bug fixes for YAHP inheritance

Notes

Google Colab may have issues installing composer with !pip install mosaicml
- Known workaround: Install through git with !pip install git+https://github.com/mosaicml/composer@main

Package Rankings

Top 2.02% on Pypi.org

Related Projects

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating poin...

20 Sep 2022 1,482

machine-learning-curriculum

Learn to make machines learn so that you don't have to struggle to program them; The ultimate list

12 Dec 2016 1,084

llm-foundry

LLM training code for Databricks foundation models

28 Apr 2023 3,899

mmpose

OpenMMLab Pose Estimation Toolbox and Benchmark.

08 Jul 2020 5,674

Machine-Learning-Guide

Machine learning Guide. Learn all about Machine Learning Tools, Libraries, Frameworks, Large Lang...

17 Oct 2020 442

the-incredible-pytorch

The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relat...

11 Feb 2017 11,389

mmengine

OpenMMLab Foundational Library for Training Deep Learning Models

08 Feb 2022 1,093

mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark

09 Jul 2020 3,194

serve

Serve, optimize and scale PyTorch models in production

03 Oct 2019 4,177

mmdeploy

OpenMMLab Model Deployment Framework

24 Dec 2021 2,729

torchtune

A Native-PyTorch Library for LLM Fine-tuning

20 Oct 2023 3,775

data-juicer

A one-stop data processing system to make data higher-quality, juicier, and more digestible for (...

01 Aug 2023 2,315

mmyolo

OpenMMLab YOLO series toolbox and benchmark. Implemented RTMDet, RTMDet-Rotated,YOLOv5, YOLOv6, Y...

18 Sep 2022 2,945

mmgeneration

MMGeneration is a powerful toolkit for generative models, based on PyTorch and MMCV.

14 Apr 2021 1,885

mmagic

OpenMMLab Multimodal Advanced, Generative, and Intelligent Creation Toolbox. Unlock the magic 🪄: ...

23 Aug 2019 6,566