My solution to TUM's Machine Learning MNIST challenge 2016-2017 [winner]
MIT License
This contest was offered within TU Munich's course Machine Learning (IN2064). The goal was to implement k-NN, Neural Network, Logistic Regression and Gaussian Process Classifier in python from scratch and achieve minimal average test error among these classifiers on well-known MNIST dataset, without ensemble learning.
Algorithm | Description | Test Error, % |
---|---|---|
k-NN | 3-NN, Euclidean distance, uniform weights.Preprocessing: Feature vectors extracted from NN. | 1.13 |
k-NN2 | 3-NN, Euclidean distance, uniform weights.Preprocessing: Augment (training) data (×9) by using random rotations,shifts, Gaussian blur and dropout pixels; PCA-35 whitening and multiplyingeach feature vector by e11.6 · s, where s – normalized explainedvariance by the respective principal axis. (equivalent to applying PCAwhitening with accordingly weighted Euclidean distance). | 2.06 |
NN | MLP 784-1337-D(0.05)-911-D(0.1)-666-333-128-10 (D – dropout);hidden activations – LeakyReLU(0.01), output – softmax; loss – categoricalcross-entropy; 1024 batches; 42 epochs; optimizer – Adam (learning rate5 · 10–5, rest – defaults from paper).Preprocessing: Augment (training) data (×5) by using random rotations, shifts, Gaussian blur. | 1.04 |
LogReg | 32 batches; 91 epoch; L2-penalty, λ = 3.16 · 10–4; optimizer – Adam (learningrate 10–3, rest – defaults from paper)Preprocessing: Feature vectors extracted from NN. | 1.01 |
GPC | 794 random data points were used for training; σn = 0; RBF kernel (σf = 0.4217,γ = 1/2l2 = 0.0008511); Newton iterations for Laplace approximation tillΔLog-Marginal-Likelihood ≤ 10–7; solve linear systems iteratively using CG with 10–7 tolerance; for prediction generate 2000 samples for each test point.Preprocessing: Feature vectors extracted from NN. | 1.59 |
And more available in experiments/plots/
.
git clone https://github.com/yell/mnist-challenge
cd mnist-challenge/
pip install -r requirements.txt
After installation, tests can be run with:
make test
Check main.py to reproduce training and testing the final models:
usage: main.py [-h] [--load-nn] model
positional arguments:
model which model to run, {'gp', 'knn', 'knn-without-nn', 'logreg',
'nn'}
optional arguments:
-h, --help show this help message and exit
--load-nn whether to use pretrained neural network, ignored if 'knn-
without-nn' is used (default: False)
Check also this notebook to see what I've tried.
Note: the approach RBM + LogReg gave only at most 91.8%
test accuracy since RBM takes too long to train with given pure python code, thus it was only trained on small subset of data (and still underfitted). However, with properly trained RBM on the whole training set, this approach can give 1.83%
test error (see my Boltzmann machines project)
All computations and time measurements were made on laptop i7-5500U CPU @ 2.40GHz x 4
12GB RAM
Here the list of what can also be tried regarding these particular 4 ML algorithms (didn't have time to check it, or it was forbidden by the rules, e.g. ensemble learning):
scipy.spatial.distance
) for k-NN;