Pandas and Spark DataFrame comparison for humans and more!
APACHE-2.0 License
Apache Spark - A unified analytics engine for large-scale data processing
A repository to keep track of all the code that I end up writing for my blog posts.
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
pronounced sUrplus as it's simply better if not best!
Fundamentals of Spark with Python (using PySpark), code examples
Type System for Data Analysis in Python
Visualize and compare datasets, target values and associations, with one line of code.
Spark Monitoring
Flexible and powerful data analysis / manipulation library for Python, providing labeled data str...
This is a guide to PySpark code style presenting common situations and the associated best practi...
Python Helper library for Jupyter Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython /...