Koalas: pandas API on Apache Spark
APACHE-2.0 License
A commandline tool for analysis of big biological data sets for distributed HPC clusters.
Monitor the stability of a Pandas or Spark dataframe ⚙︎
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Ka...
A free tutorial for Apache Spark.
A repository to keep track of all the code that I end up writing for my blog posts.
Apache Spark Connector for Azure Cosmos DB
Data Lakehouse local stack with PySpark, Trino, and Minio. Includes an example to process Raygun ...
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to ha...
Apache Spark - A unified analytics engine for large-scale data processing
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Simple and Distributed Machine Learning
Type System for Data Analysis in Python
Pandas and Spark DataFrame comparison for humans and more!
This project integrates real-time data processing and analytics using Apache NiFi, Kafka, Spark, ...