Apache Arrow Ballista Distributed Query Engine
APACHE-2.0 License
Distributed SQL Engine in Python using Dask
Visualize column-level data lineage in Spark SQL
ETL pipeline using pyspark (Spark - Python)
Apache Spark - A unified analytics engine for large-scale data processing
Apache DataFusion SQL Query Engine
Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python...
Fundamentals of Spark with Python (using PySpark), code examples
BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
Making data lake work for time series
Apache Beam is a unified programming model for Batch and Streaming data processing.