Mirror of Apache DataFu
APACHE-2.0 License
Data Lakehouse local stack with PySpark, Trino, and Minio. Includes an example to process Raygun ...
Apache Freemarker
Java Sketch Characterization Code.
Apache Beam starter repo for Java
Apache JDO project
Apache Maven Sources
A distributed data integration framework that simplifies common aspects of big data integration s...
Mirror of Apache Kudu
REST job server for Apache Spark
Apache Nutch is an extensible and scalable web crawler
Apache Spark - A unified analytics engine for large-scale data processing
Spark Connector to read and write with Pulsar
High performance native memory access for Java.
Sketch adaptors for Hive.
A simple Spark-powered ETL framework that just works 🍺