Real-time financial data processing using Apache Kafka, Spark, MySQL, and Grafana, orchestrated with Docker. This pipeline fetches, processes, stores, and visualises stock data.
MIT License
This project demonstrates a real-time financial data processing pipeline using Apache Kafka, Apache Spark, MySQL, and Grafana, all orchestrated with Docker. The pipeline fetches stock data from the Financial Modeling Prep API, processes it using Spark, stores the processed data in MySQL, and visualises it using Grafana.
The project consists of the following components:
The project uses requirements.txt
files to manage the Python dependencies for the Kafka producer and Spark processing scripts. The dependencies are installed within the respective Docker containers during the build process.
Clone the project repository:
git clone https://github.com/hawa1222/real-time-data-processing.git
Navigate to the project directory:
cd real-time-data-processing
Set up your environment:
Make the setup script executable (if it's not already):
chmod +x setup_environment.sh
Then run the setup_environment.sh
script to create a virtual environment and install all necessary packages. Execute this script from the root directory of the project:
./setup_environment.sh
Create a .env
file in the project root directory and provide the environment variables as specified in .env_template
.
Build and run the Docker containers:
docker-compose up --build
If you wish to run the Spark and Kafka Python scripts individually without using Docker, activate the virtual environment created by setup_environment.sh, run zookeeper & kafka locally, and run the scripts from the command line.
For Kafka:
python kafka/kafka_producer.py
For Spark:
python spark/process_data.py
Access the Grafana dashboard:
Open your web browser and visit http://localhost:3000. Log in using the admin credentials you provided in the .env
file.
datasource.yml
file.stock_data_dashboard.json
file..env
file root directory to change the MySQL connection details if required.stock_data_dashboard.json
file in the grafana/
directory to modify the default Grafana dashboard.This project is licensed under the MIT License - see the LICENSE.txt file for details.