Distributed File System written in Python
APACHE-2.0 License
An open-source, scalable, decentralized, robust, heterogeneous file storage solution which is fau...
A commandline tool for analysis of big biological data sets for distributed HPC clusters.
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Python clone of Spark, a MapReduce alike framework in Python
A Machine Learning API with native redis caching and export + import using S3. Analyze entire dat...
♃ Debian packaging of JupyterHub, a multi-user server for Jupyter notebooks
Dask tutorials for Big Data Analysis and Machine Learning as Jupyter notebooks
Quickly setup and simulate a multi node spark cluster using docker and docker-compose.
API and command line interface for HDFS
A travel agency app with a distributed database implemented from scratch!