This project aims to predict smartphone prices using a combination of batch and stream processing techniques in a Big Data environment. The architecture follows the Lambda Architecture pattern, providing both real-time and batch processing capabilities to users.
This project aims to predict smartphone prices using a combination of batch and stream processing techniques in a Big Data environment. The architecture follows the Lambda Architecture pattern, providing both real-time and batch processing capabilities to users.
The project architecture consists of five main layers: the ingestion layer, the batch layer, the stream layer, the serving layer and the visualization layer.
The repository is organized as follows:
Big-Data-Project:.
| README.md
|
+---images
| architecture.png
| dashboard_phone.png
| run_web_app.png
| spring_boot_web_app.png
|
\---Main
| commands.sh
| Dashboard.pbix
|
+---.idea
| workspace.xml
|
+---Lambda
| | docker-compose.yaml
| | producer.py
| | transform.py
| |
| +---.idea
| | | .gitignore
| | | .name
| | | misc.xml
| | | modules.xml
| | | price prediction (big data envirnment).iml
| | | vcs.xml
| | | workspace.xml
| | |
| | \---inspectionProfiles
| | profiles_settings.xml
| |
| +---Batch_layer
| | | batch_layer.py
| | | batch_pipeline.py
| | | HDFS_consumer.py
| | | put_data_hdfs.py
| | | save_data_postgresql.py
| | | spark_tranformation.py
| | | __init__.py
| | |
| | +---dags
| | | syc_with_Airflow.py
| | | __init__.py
| | |
| | \---__pycache__
| | batch_layer.cpython-310.pyc
| | HDFS_consumer.cpython-310.pyc
| | put_data_hdfs.cpython-310.pyc
| | save_data_postgresql.cpython-310.pyc
| | spark_tranformation.cpython-310.pyc
| | __init__.cpython-310.pyc
| |
| +---ML_operations
| | | xgb_model.pkl
| | |
| | \---__pycache__
| +---real_time_web_app(Flask)
| | | app.py
| | | get_Data_from_hbase.py
| | |
| | +---static
| | | +---css
| | | | style.css
| | | |
| | | \---js
| | | script.js
| | |
| | +---templates
| | | index.html
| | |
| | \---__pycache__
| | get_Data_from_hbase.cpython-310.pyc
| |
| +---Stream_data
| | | stream_data.csv
| | | stream_data.py
| | |
| | \---__pycache__
| +---Stream_layer
| | insert_data_hbase.py
| | ML_consumer.py
| | stream_pipeline.py
| | __init__.py
| |
| \---__pycache__
| producer.cpython-310.pyc
| transform.cpython-310.pyc
|
\---real_time_app(Spring boot)
| .classpath
| .gitignore
| .project
| HELP.md
| mvnw
| mvnw.cmd
| pom.xml
|
+---.mvn
| \---wrapper
| maven-wrapper.jar
| maven-wrapper.properties
|
+---.settings
| org.eclipse.core.resources.prefs
| org.eclipse.jdt.core.prefs
| org.eclipse.m2e.core.prefs
|
+---src
| +---main
| | +---java
| | | \---com
| | | \---example
| | | \---demo
| | | | RealTimeAppApplication.java
| | | |
| | | +---controller
| | | | IndexController.java
| | | |
| | | \---service
| | | HbaseService.java
| | |
| | \---resources
| | | application.properties
| | |
| | +---static
| | | +---css
| | | | style.css
| | | |
| | | \---js
| | | script.js
| | |
| | \---templates
| | index.html
| |
| \---test
| \---java
| \---com
| \---example
| \---demo
| RealTimeAppApplicationTests.java
|
\---target
+---classes
| | application.properties
| |
| +---com
| | \---example
| | \---demo
| | | RealTimeAppApplication.class
| | |
| | +---controller
| | | IndexController.class
| | |
| | \---service
| | HbaseService.class
| |
| +---META-INF
| | | MANIFEST.MF
| | |
| | \---maven
| | \---com.example
| | \---real_time_app
| | pom.properties
| | pom.xml
| |
| +---static
| | +---css
| | | style.css
| | |
| | \---js
| | script.js
| |
| \---templates
| index.html
|
\---test-classes
\---com
\---example
\---demo
RealTimeAppApplicationTests.class
This project requires the following software to be installed and configured on your system:
Big Data Stack:
Programming Languages and Frameworks:
Machine Learning Library:
Additional Tools:
By installing and configuring these tools, you will have the necessary environment to run this project and leverage its real-time and batch processing capabilities for smartphone price prediction and analysis.
To set up and run the project locally, follow these steps:
git clone https://github.com/aymane-maghouti/Big-Data-Project
zookeeper-server-start.bat C:/kafka_2.13_2.6.0/config/zookeeper.properties
kafka-server-start.bat C:/kafka_2.13_2.6.0/config/server.properties
kafka-topics.bat --create --topic smartphoneTopic --bootstrap-server localhost:9092
kafka-console-producer.bat --topic smartphoneTopic --bootstrap-server localhost:9092
kafka-console-consumer.bat --topic smartphoneTopic --from-beginning --bootstrap-server localhost:9092
start-all
start-hbase
hbase thrift start
after all this run stream_pipeline.py
script.
and then open the spring boot appliation in your idea and run it (you can access to the web app locally on localhost:8081/
)
note that there is another version of the web app developed using Flask micro-framework(watch the demo video for mor details)
docker-compose up -d
Access the Apache Airflow web UI (localhost:8080) and run the DAG
spark-shell
zookeeper-server-start.bat C:/kafka_2.13_2.6.0/config/zookeeper.properties
kafka-server-start.bat C:/kafka_2.13_2.6.0/config/server.properties
kafka-console-producer.bat --topic smartphoneTopic --bootstrap-server localhost:9092
kafka-console-consumer.bat --topic smartphoneTopic --from-beginning --bootstrap-server localhost:9092
start-all
dashboard.pbix
attached with this projectafter all this run syc_with_Airflow.py
script.
This project utilizes two dashboards to visualize smartphone price predictions and historical data:
Here is the UI of th Spring Boot web application:
Here is the Dashboard created in Power BI:
Python
, Kafka
, HDFS
, Spark
,Hbase
,Spring Boot
and Airflow
you can watch the demo video here
For any inquiries or further information, please contact: