Apache Tez
APACHE-2.0 License
Mirror of Apache Oozie
Welcome to StreamlineDE, an end-to-end data engineering project designed to demonstrate real-time...
Utilizing my background and love for Apache Airflow and Data to build a real-time data streaming ...
More than 2000+ Data engineer interview questions.
[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Mirror of Apache HCatalog
Apache Atlas
Data analytics pipeline built with Apache Spark and Hadoop for processing and analyzing large-sca...
Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside...
Fundamentals of Spark with Python (using PySpark), code examples
Streaming data processing using Hadoop HDFS, Spark, Kafka, Minio, Elasticsearch
Mirror of Apache Pig
ETL pipeline using pyspark (Spark - Python)
An open source framework for building data analytic applications.
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage...