Szilard Pafka

physics PhD, chief (data) scientist, meetup organizer, (visiting) professor, machine learning benchmarks

Ecosystems: R, Python, Apache Spark, Linux

Projects

benchm-ml

A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).

R - Released: 28 Mar 2015 - 1,870

benchm-databases

A minimal benchmark of various tools (statistical software, databases etc.) for working with tabular data of moderately large sizes (interactive data analysis).

R - Released: 25 Feb 2015 - 90

benchm-dl

Playing with various deep learning tools and network architectures

Python - Released: 23 Oct 2016 - 69

xgboost-adv-workshop-LA

Advanced workshop on XGBoost with Tianqi Chen in Santa Monica, June 2, 2016

R - Released: 20 May 2016 - 26

ML-scoring

Compare the scoring speed of several open source machine learning libraries.

R - Released: 23 May 2017 - 20

kaggle-scripts-R-pydata

Kaggle scripts: R vs pydata + most popular R and Python packages for Machine Learning

R - Released: 22 Jan 2016 - 11

dscomp-winstab

Winner stability in data science competitions

R - Released: 21 Feb 2020 - 8

ml-algos-perf

Performance of Machine Learning Algorithms - playground for experimentation in order to understand their performance characteristics as a function of the attributes of the datasets used for training

Python - Released: 19 Aug 2016 - 7