Explore the capabilities of Amazon EMR Serverless by processing semi-structured review data with Apache Spark, showcasing efficient big data analysis without managing clusters.
MIT License
This project showcases the utilization of Amazon EMR Serverless for running a sample Spark job to process semi-structured review data. The goal is to demonstrate the capabilities of Amazon EMR Serverless in efficiently processing and analyzing big data workloads. Overview Amazon EMR (Elastic MapReduce) Serverless is a serverless big data processing service that enables you to run Apache Spark applications without managing clusters. In this demonstration, we leverage EMR Serverless to process semi-structured review data stored in JSON format and derive insights from the analysis.
1. Scripts:
2. Sample Dataset:
1. Setup Amazon EMR Serverless:
2. Run Spark Job:
3. Analyze Data with Amazon Athena: