An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
APACHE-2.0 License
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitate...
A simple Spark-powered ETL framework that just works 🍺
Smart Automation Tool for building modern Data Lakes and Data Pipelines
📚 Awesome list for Data Lake
This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to ha...
lakeFS - Data version control for your data lake | Git for data
scala、spark使用过程中,各种测试用例以及相关资料整理
A free tutorial for Apache Spark.
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a ric...
Basic framework utilities to quickly start writing production ready Apache Spark applications
SparkSQL.jl enables Julia programs to work with Apache Spark data using just SQL.
Data Lakehouse local stack with PySpark, Trino, and Minio. Includes an example to process Raygun ...
Mirror of Apache Deltaspike
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion...
High performance data store solution