Deep Semi-Supervised Learning with Holistic methods (SSLH)

Unofficial PyTorch and PyTorch-Lightning implementations of Deep Semi-Supervised Learning methods for audio tagging.

There is 4 SSL methods :

FixMatch (FM) [1]
MixMatch (MM) [2]
ReMixMatch (RMM) [3]
Unsupervised Data Augmentation (UDA) [4]

For the following datasets :

With 3 models :

IMPORTANT NOTE: The implementation of Mean Teacher (MT), Deep Co-Training (DCT) and Pseudo-Labeling (PL) are present in this repository but not fully tested.

You can find a more stable version of MT and DCT at https://github.com/Labbeti/semi-supervised. The datasets AudioSet and FSD50K are not officially supported.

If you meet problems to run experiments, you can contact me at [email protected].

Installation

Download & setup

git clone https://github.com/Labbeti/SSLH
conda env create -n env_sslh -f environment.yaml
conda activate env_sslh
pip install -e SSLH --no-dependencies

Alternatives

As python package :

pip install https://github.com/Labbeti/SSLH

The dependencies will be automatically installed with pip instead of conda, which means the the build versions can be slightly different.

The project contains also a environment.yaml and requirements.txt for installing the packages respectively with conda or pip.

With conda environment file :

conda env create -n env_sslh -f environment.yaml
conda activate env_sslh
pip install -e . --no-dependencies

With pip requirements file :

pip install -r requirements.txt
pip install -e . --no-dependencies

Datasets

CIFAR10, ESC10, GoogleSpeechCommands and FSD50K can be downloaded and installed. For UrbanSound8k, please read the README of leocances, in section "Prepare the dataset". AudioSet (ADS) and Primate Vocalize Corpus (PVC) cannot be installed automatically by now.

To download a dataset, you can use the data.dm.download=true option.

Usage

This code use Hydra for parsing args. The syntax of setting an argument is "name=value" instead of "--name value".

Example 1 : MixMatch on ESC10

python -m sslh.mixmatch data=ssl_esc10 data.dm.download=true

Example 2 : Supervised+Weak on GSC

python -m sslh.supervised data=sup_gsc aug@train_aug=weak data.dm.bsize=256 epochs=300 data.dm.download=true

Example 3 : FixMatch+MixUp on UBS8K

python -m sslh.fixmatch data=ssl_ubs8K pl=fixmatch_mixup data.dm.bsize_s=128 data.dm.bsize_u=128 epochs=300 data.dm.download=true

Example 4 : ReMixMatch on CIFAR-10

python -m sslh.remixmatch data=ssl_cifar10 model.n_input_channels=3 aug@weak_aug=img_weak aug@strong_aug=img_strong data.dm.download=true

List of main arguments

Name	Description	Values	Default
data	Dataset used	(sup	ssl)_(ads
pl	Pytorch Lightning training method (experiment) used	(depends of the python script, see the filenames in config/pl/ folder)	(depends of the python script)
model	Pytorch model to use	mobilenetv1, mobilenetv2, vgg, wideresnet28	wideresnet28
optim	Optimizer used	adam, sgd	adam
sched	Learning rate scheduler	cosine, softcosine, none	softcosine
epochs	Number of training epochs	int	1
bsize	Batch size in SUP methods	int	60
ratio	Ratio of the training data used in SUP methods	float in [0, 1]	1.0
bsize_s	Batch size of supervised part in SSL methods	int	30
bsize_u	Batch size of unsupervised part in SSL methods	int	30
ratio_s	Ratio of the supervised training data used in SSL methods	float in [0, 1]	0.1
ratio_u	Ratio of the unsupervised training data used in SSL methods	float in [0, 1]	0.9

SSLH Package overview

sslh
├── callbacks
├── datamodules
│     ├── supervised
│     └── semi_supervised
├── datasets
├── pl_modules
│     ├── deep_co_training
│     ├── fixmatch
│     ├── mean_teacher
│     ├── mixmatch
│     ├── mixup
│     ├── pseudo_labeling
│     ├── remixmatch
│     ├── supervised
│     └── uda
├── metrics
├── models
├── transforms
│     ├── get
│     ├── image
│     ├── other
│     ├── pools
│     ├── self_transforms
│     ├── spectrogram
│     └── waveform
└── utils

Authors

This repository has been created by Etienne Labbé (Labbeti on Github).

It contains also some code from the following authors :

Léo Cancès (leocances on github)
- For AudioSet, ESC10, GSC, PVC and UBS8K datasets base code.
Qiuqiang Kong (qiuqiangkong on Github)
- For MobileNetV1 & V2 model implementation from PANN.

Additional notes

This project has been made with Ubuntu 20.04 and Python 3.8.5.

Glossary

Acronym	Description
activation	Activation function
ADS	AudioSet
aug, augm, augment	Augmentation
ce	Cross-Entropy
expt	Experiment
fm	FixMatch
fn, func	Function
GSC	Google Speech Commands dataset (with 35 classes)
GSC12	Google Speech Commands dataset (with 10 classes from GSC, 1 unknown class and 1 silence class)
hparams	Hyperparameters
js	Jensen-Shannon
kl	Kullback-Leibler
loc	Localisation
lr	Learning Rate
mm	MixMatch
mse	Mean Squared Error
pred	Prediction
PVC	Primate Vocalize Corpus dataset
rmm	ReMixMatch
_s	Supervised
sched	Scheduler
SSL	Semi-Supervised Learning
SUP	Supervised Learning
_u	Unsupervised
UBS8K	UrbanSound8K dataset

References

[1] K. Sohn, D. Berthelot, C.-L. Li, Z. Zhang, N. Carlini, E. D. Cubuk, A. Ku- rakin, H. Zhang, and C. Raffel, “FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence,” p. 21.

[2] D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, and C. Raffel, “MixMatch: A Holistic Approach to Semi-Supervised Learning,” Oct. 2019, number: arXiv:1905.02249 arXiv:1905.02249 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1905.02249

[3] D. Berthelot, N. Carlini, E. D. Cubuk, A. Kurakin, K. Sohn, H. Zhang, and C. Raffel, “ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring,” Feb. 2020, number: arXiv:1911.09785 arXiv:1911.09785 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1911.09785

[4] Q. Xie, Z. Dai, E. Hovy, M.-T. Luong, and Q. V. Le, “Unsu- pervised Data Augmentation for Consistency Training,” Nov. 2020, number: arXiv:1904.12848 arXiv:1904.12848 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1904.12848

Cite this repository

If you use this code, you can cite the following paper associated :

@article{cances_comparison_2022,
	title        = {Comparison of semi-supervised deep learning algorithms for audio classification},
	author       = {Cances, Léo and Labbé, Etienne and Pellegrini, Thomas},
	year         = 2022,
	month        = sep,
	journal      = {EURASIP Journal on Audio, Speech, and Music Processing},
	volume       = 2022,
	number       = 1,
	pages        = 23,
	doi          = {10.1186/s13636-022-00255-6},
	issn         = {1687-4722},
	url          = {https://doi.org/10.1186/s13636-022-00255-6},
	abstract     = {In this article, we adapted five recent SSL methods to the task of audio classification. The first two methods, namely Deep Co-Training (DCT) and Mean Teacher (MT), involve two collaborative neural networks. The three other algorithms, called MixMatch (MM), ReMixMatch (RMM), and FixMatch (FM), are single-model methods that rely primarily on data augmentation strategies. Using the Wide-ResNet-28-2 architecture in all our experiments, 10\% of labeled data and the remaining 90\% as unlabeled data for training, we first compare the error rates of the five methods on three standard benchmark audio datasets: Environmental Sound Classification (ESC-10), UrbanSound8K (UBS8K), and Google Speech Commands (GSC). In all but one cases, MM, RMM, and FM outperformed MT and DCT significantly, MM and RMM being the best methods in most experiments. On UBS8K and GSC, MM achieved 18.02\% and 3.25\% error rate (ER), respectively, outperforming models trained with 100\% of the available labeled data, which reached 23.29\% and 4.94\%, respectively. RMM achieved the best results on ESC-10 (12.00\% ER), followed by FM which reached 13.33\%. Second, we explored adding the mixup augmentation, used in MM and RMM, to DCT, MT, and FM. In almost all cases, mixup brought consistent gains. For instance, on GSC, FM reached 4.44\% and 3.31\% ER without and with mixup. Our PyTorch code will be made available upon paper acceptance at https://github.com/Labbeti/SSLH.}
}