Study project for big data (Hadoop, Zookeeper, Kafka, Flink, Spark)
MIT License
Apache Software Foundation Parent POM
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Stre...
Data Lakehouse local stack with PySpark, Trino, and Minio. Includes an example to process Raygun ...
scala、spark使用过程中,各种测试用例以及相关资料整理
Sample analysis for the latest yelp dataset using spark
The project aims to automate content classification and knowledge retrieval, as well as to perfor...
Dockerizing an Apache Spark Standalone Cluster
50+ DockerHub public images for Docker & Kubernetes - DevOps, CI/CD, GitHub Actions, CircleCI, Je...
Apache Spark - A unified analytics engine for large-scale data processing
Experiment tracking server focused on speed and scalability
Streaming data processing using Hadoop HDFS, Spark, Kafka, Minio, Elasticsearch
Master's thesis on Big Data
An open source framework for building data analytic applications.
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitate...