Building video analytics framework for large scale application using Big Data.
The architecture consists of the following components:
Spin up Kafka containers for two servers (listening on ports 9093 and 9095) and Zookeeper (listening on port 2181) using Docker Compose:
docker-compose up -d
Kafka:
Spark:
Python Libraries:
Create a Conda environment and install the required libraries from requirements.txt:
pip install -r requirements.txt
python confluentKafkaProducer
Source the bash profile:
source ~/.bash_profile
Run Spark with the following command:
spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1 sparkConsumer.py
Start the Kafka Consumer:
python kafkaConsumer.py
To ensure Spark can access Conda environment libraries, set these environment variables:
export PYSPARK_PYTHON=$(which python)
export PYSPARK_DRIVER_PYTHON=$(which python)
To list running Kafka topics:
bin/kafka-topics.sh --list --bootstrap-server localhost:PORT
To delete a Kafka topic:
kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic your_topic_name
Some suggestions: