Data engineering meets software engineering
This is a guide to PySpark code style presenting common situations and the associated best practi...
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
High-level wrapper around BCP for high performance data transfers between pandas and SQL Server. ...
A framework for data piping in python
Apache Spark Machine Learning project using MLlib and Linear Regression on Databricks!
Apache Spark - A unified analytics engine for large-scale data processing
Complete Roadmap For Data Science
TODS: An Automated Time-series Outlier Detection System
Fundamentals of Spark with Python (using PySpark), code examples
a pandas sklearn-inspired pipeline data processor
Default risk prediction for Home Credit competition - Fast, scalable and maintainable SQL-based f...
Productivity Utilities for Data Science with Python Notebooks
ETL pipeline using pyspark (Spark - Python)
🌳 WALD Stack Demo 🏎️