Use SQL to build ELT pipelines on a data lakehouse.
APACHE-2.0 License
More than 2000+ Data engineer interview questions.
An open-source storage framework that enables building a Lakehouse architecture with compute engi...
This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. T...
lakeFS - Data version control for your data lake | Git for data
Nessie: Transactional Catalog for Data Lakes with Git-like semantics
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion...
An open source framework for building data analytic applications.
Apache Iceberg - Go
Data Lakehouse local stack with PySpark, Trino, and Minio. Includes an example to process Raygun ...
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
Smart Automation Tool for building modern Data Lakes and Data Pipelines
📚 Awesome list for Data Lake
Visualize column-level data lineage in Spark SQL
Web-based notebook that enables data-driven, interactive data analytics and collaborative documen...
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars cod...