My notebook on using Python with Jupyter Notebook, PySpark etc
MIT License
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Project to compare write efficiency and memory efficiency of CSV and Parquet files
Pyspark RDD, DataFrame and Dataset Examples in Python language
Fundamentals of Spark with Python (using PySpark), code examples
Apache Spark - A unified analytics engine for large-scale data processing
ETL pipeline using pyspark (Spark - Python)
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
pronounced sUrplus as it's simply better if not best!
A repository to keep track of all the code that I end up writing for my blog posts.
Complete Roadmap For Data Science
This repo contains implementations of PySpark for real-world use cases for batch data processing,...
Turn any collection of files into a dataset
Apache Spark Machine Learning project using MLlib and Linear Regression on Databricks!