StreamlineDE-

Welcome to StreamlineDE, an end-to-end data engineering project designed to demonstrate real-time data ingestion, processing, and storage using a modern data engineering stack. This project showcases seamless integration of tools like Apache Airflow, Kafka, Spark, and Cassandra, all containerized with Docker for easy deployment.

Stars
0
Committers
1

StreamlineDE: Real-time Data Streaming & Processing Pipeline

StreamlineDE is your one-stop solution for building a scalable, end-to-end data engineering pipeline that streams, processes, and stores data in real time. Containerized with Docker, its easy to deploy and scale across environments!

Table of Contents


Introduction

StreamlineDE is a hands-on project aimed at demonstrating real-time data streaming and processing using state-of-the-art tools like Apache Kafka, Apache Spark, Apache Airflow, and Cassandra. Learn how to orchestrate a complex pipeline, process streaming data, and store the processed information in distributed databases. Best of all, its all containerized for effortless deployment!


System Architecture

Key Components:

  1. ** Data Source**: Data from randomuser.me API simulates real-world, continuous data flow.
  2. ** Apache Airflow**: Orchestrates the pipeline by fetching data into PostgreSQL.
  3. ** Apache Kafka**: Streams data from PostgreSQL to the processing engine.
  4. ** Apache Zookeeper**: Synchronizes Kafka clusters.
  5. ** Apache Spark**: Processes data in real-time.
  6. ** Cassandra**: Stores the processed data in a NoSQL, distributed database.
  7. ** Docker**: Containerizes the entire architecture for ease of deployment.

What You'll Learn

  • Build a real-time data pipeline with Apache Airflow.
  • Handle data streaming using Apache Kafka.
  • Use Apache Spark for real-time data processing.
  • Store processed data in Cassandra and relational data in PostgreSQL.
  • Containerize a full data pipeline using Docker.
  • Monitor and manage Kafka streams using Control Center and Schema Registry.

Technologies Used


Getting Started

To get started with StreamlineDE, follow these steps:

Prerequisites

Ensure you have the following installed:

Installation

  1. Clone the repository:
    git clone https://github.com/badhanhitesh/StreamlineDE.git
    
    
  2. Navigate to the project directory:
    cd StreamlineDE
    
    
  3. Spin up the services with Docker Compose:
    docker-compose up
    
  4. Access the interfaces:
    Airflow: http://localhost:8080
    Kafka Control Center: http://localhost:9021
Related Projects