Data_Processing_using_Spark_Flink

This project demonstrates data cleaning, processing with Apache Spark and Apache Flink, both locally and on AWS EMR.

Stars

Committers

View Code on GitHub

Ecosystems: Apache Spark, Amazon Web Services

Commit Statistics

Past Year

All Time

Total Commits

Total Committers

Avg. Commits Per Committer

2.0

Bot Commits

Issue Statistics

Past Year

All Time

Total Pull Requests

Merged Pull Requests

Total Issues

Time to Close Issues

N/A

Related Projects

fraud-detector

Project FraudCatch leverages AI to predict and prevent financial fraud in real-time. It uses Apac...

21 Jul 2024 1

spark-Jupyter-AWS

A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support

25 Nov 2016 262

Udacity-Data-Engineering-Projects

Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, ...

20 Jan 2020 1,480

Udacity-Data-Engineering

Udacity Data Engineering Nano Degree (DEND)

16 Apr 2019 183

emr-bootstrap-pyspark

Quickstart PySpark with Anaconda on AWS/EMR

06 Nov 2016 53

aws-big-data-study

Study Guide for AWS Big Data Speciality Certification

11 May 2019 16

Yahoo-finances-data-event

A data engineering training project to build an end-to-end pipline for a real-time processing of ...

10 May 2024 0

aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, Qu...

26 Feb 2019 3,791

data-engineering-interview-questions

More than 2000+ Data engineer interview questions.

08 Aug 2021 1,060

xonai-dashboard

A Grafana-based application to assist Big Data infrastructure optimization initiatives where Spar...

22 Mar 2024 12

Data-engineering-nanodegree

Projects done in the Data Engineering Nanodegree by Udacity.com

19 Apr 2019 267

Covid-Data-Process

This project integrates real-time data processing and analytics using Apache NiFi, Kafka, Spark, ...

18 Aug 2024 5

pyspark-on-aws-emr

The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances...

18 Feb 2021 24

Spark-Processing-AWS

👷🌇 Set up and build a big data processing pipeline with Apache Spark, 📦 AWS services (S3, EMR, EC...

18 Jun 2024 2

cdk-emrserverless-with-delta-lake

This construct builds some elements for you to quickly launch an EMR Serverless application. Afte...

03 Jun 2022 8