Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
APACHE-2.0 License
A tutorial on how to use pulsar-spark-connector
This is the development repository for sparkMeasure, a tool and library designed for efficient an...
Haskell on Apache Spark.
Apache Nemo (Incubating) - Data Processing System for Flexible Employment With Different Deployme...
low-level helpers for Apache Spark libraries and tests
A tool for monitoring and tuning Spark jobs for efficiency.
电商用户行为分析大数据平台
A simple Spark-powered ETL framework that just works 🍺
REST job server for Apache Spark
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its ...
Apache Spark - A unified analytics engine for large-scale data processing
Spark Connector to read and write with Pulsar
This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. T...
Apache Accumulo Testing
Extensible streaming ingestion pipeline on top of Apache Spark