Unifying Variational Autoencoder (VAE) implementations in Pytorch (NeurIPS 2022)
APACHE-2.0 License
New features
pydantic=2.*
(#105)BaseTrainer
thanks to @liamchalcroft (#90)Minor changes
predict
method in RHVAE
thanks to @soumickmj (#80)SVAE
model for stability thanks to @soumickmj (#79)Published by clementchadebec over 1 year ago
New features
TrainHistoryCallback
that stores the training metrics during training in #71 by @VolodyaCOfrom pythae.trainers.training_callbacks import TrainHistoryCallback
>>> train_history = TrainHistoryCallback()
>>> callbacks = [train_history]
>>> pipeline(
... train_data=train_dataset,
... eval_data=eval_dataset,
... callbacks=callbacks
... )
>>> train_history.history
... {
... 'train_loss': [58.51896972363562, 42.15931177749049, 40.583426756017346],
... 'eval_loss': [43.39408182034827, 41.45351771943888, 39.77221281209569]
... }
predict
method that encodes and decodes input data without loss computation in #75 by @soumickmj and @ravih18>>> out = model.predict(eval_dataset[:3])
>>> out.embedding.shape, out.recon_x.shape
... (torch.Size([3, 16]), torch.Size([3, 1, 28, 28]))
embed
method that returns the latent representations of the input data in #76 by @tbouchik>>> out = model.embed(eval_dataset[:3].to(device))
>>> out.shape
... torch.Size([3, 16])
Published by clementchadebec over 1 year ago
New features π
Pythae
now supports distributed training (built on top of PyTorch DDP). Launching a distributed training can be done using a training script in which all of the distributed environment variables are passed to a BaseTrainerConfig
instance as follows:training_config = BaseTrainerConfig(
num_epochs=10,
learning_rate=1e-3,
per_device_train_batch_size=64,
per_device_eval_batch_size=64,
dist_backend="nccl", # distributed backend
world_size=8 # number of gpus to use (n_nodes x n_gpus_per_node),
rank=0 # process/gpu id,
local_rank=1 # node id,
master_addr="localhost" # master address,
master_port="12345" # master port,
)
The script can then be launched using a launcher such a srun
. This module was tested in both mono-node-multi-gpu and multi-node-multi-gpu settings.
MSSSIM_VAE
now supports 3D images πMajor Changes
optimizers
and schedulers
changed. It is no longer needed to build the optimizer
(resp. scheduler
) and pass them to the Trainer
. As of v0.1.0, the choice and parameters of the optimizers
and schedulers
can be passed directly to the TrainerConfig
. See changes below:As of v0.1.0
my_model = VAE(model_config=model_config)
# Specify instances and params directly in Trainer config
training_config = BaseTrainerConfig(
...,
optimizer_cls="AdamW",
optimizer_params={"betas": (0.91, 0.995)}
scheduler_cls="MultiStepLR",
scheduler_params={"milestones": [10, 20, 30], "gamma": 10**(-1/5)}
)
trainer = BaseTrainer(
model=model,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
training_config=training_config
)
# Launch training
trainer.train()
Before v0.1.0
my_model = VAE(model_config=model_config)
training_config = BaseTrainerConfig(...)
### Optimizer
optimizer = torch.optim.AdamW(model.parameters(), lr=training_config.learning_rate, betas=(0.91, 0.995))
### Scheduler
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[10, 20, 30], gamma=10**(-1/5))
# Pass instances to Trainer
trainer = BaseTrainer(
model=model,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
training_config=training_config,
optimizer=optimizer,
scheduler=scheduler
)
# Launch training
trainer.train()
batch_size
key no longer available in the Trainer
configurations. It is replaced by the keys per_device_train_batch_size
and per_device_eval_batch_size
where the batch size per device is specified. Please note that if you are in a distributed setting with for instance 4 GPUs and specify a per_device_eval_batch_size=64
, this is equivalent to training on a single GPU using a batch_size of 4*64.Minor changes
Trainer
configuration under the keys train_dataloader_num_workers
and eval_dataloader_num_workers
__init__
of Trainers
and moved sanity checks from train
method to __init__
optimizers
and schedulers
in TrainerConfing
__post_init_post_parse__
Published by clementchadebec about 2 years ago
New features
comet_ml
through CometCallback
training callbacks further to #55Bugs fixed π
pickle5
compatibility with python>=3.8
conda-forge
feedstock with correct requirements (https://github.com/conda-forge/pythae-feedstock/pull/11)Published by clementchadebec about 2 years ago
New Features:
MLFlowCallback
in TrainingCalbacks
further to #44Dataset
inheriting from torch.utils.data.Dataset
to be passed as inputs in the training_pipeline
further to #35def __call__(
self,
train_data: Union[np.ndarray, torch.Tensor, torch.utils.data.Dataset],
eval_data: Union[np.ndarray, torch.Tensor, torch.utils.data.Dataset] = None,
callbacks: List[TrainingCallback] = None,
):
MIWAE
, PIWAE
and CIWAE
(https://arxiv.org/abs/1802.04537)Minor changes
FactorVAE
with other models. (half of the batch is used for reconstruction and the other one for factorial representation)trainers
(use loaders in check instead of datasets)CoupledOptimizerTrainer
and update testsPublished by clementchadebec about 2 years ago
New features
PoincareVAE
model and PoincareDiskSampler
implementation following https://arxiv.org/abs/1901.06033
Minor changes
Published by clementchadebec over 2 years ago
New features
interpolate
method allowing to interpolate linearly from given inputs in the latent space of any pythae.models
(further to #34)reconstruct
method allowing to reconstruct easily given input data with any any pythae.models
.Published by clementchadebec over 2 years ago
Bug π
Fix HF Hub Model cards
Published by clementchadebec over 2 years ago
Changes
python3.7+
python3.6
no longer supportedPublished by clementchadebec over 2 years ago
New features
push_to_hf_hub
method allowing to push pythae.models
instances to the HuggingFace Hubload_from_hf_hub
method allowing to download pre-trained models from the Hubwandb
callbacks)Published by clementchadebec over 2 years ago
First release on pypi