A library developed to ease the data ETL development process.
APACHE-2.0 License
A free tutorial for Apache Spark.
Jupyter magics and kernels for working with remote Spark clusters
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars cod...
Implementing best practices for PySpark ETL jobs and applications.
Sparglim✨ makes PySpark App Configurable and Deploy Spark Connect Server Easier!
Data Lakehouse local stack with PySpark, Trino, and Minio. Includes an example to process Raygun ...
R interface for Apache Spark
A playground to experience Gravitino
Apache Spark - A unified analytics engine for large-scale data processing
Visualize column-level data lineage in Spark SQL
A whitespace formatter for different query languages
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark...
A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Simple and Distributed Machine Learning
A simple Spark-powered ETL framework that just works 🍺