BigVideoAnalytics

Building video analytics framework for large scale application using Big Data.

Stars

0

Committers

View Code on GitHub

Ecosystems: Apache Spark, Docker, Docker Compose, Python

Architecture of the project

The architecture consists of the following components:

Producer: Reads frames from video files or live streams and publishes them to a Kafka server. Each frame is sent to a topic corresponding to the video file name.
Kafka Server: Stores frames in their respective topics.
Spark Consumer: Consumes frames from Kafka, applies a user-defined function (UDF), such as a face detector, and pushes processed frames to a second Kafka server.
Final Kafka Consumer: Writes frames according to the topic name and saves the processed videos to the output folder.

Installation

Using Docker Compose

Spin up Kafka containers for two servers (listening on ports 9093 and 9095) and Zookeeper (listening on port 2181) using Docker Compose:

docker-compose up -d

Manual Installation

Kafka:
- Run Kafka using kafka_start.sh.
- Ensure you create two different server.properties files in the conf directory and adjust the broker ID and listening port.
Spark:
- Download and install Spark from Apache Spark Downloads.
- Alternatively, use the provided Dockerfile for Spark installation.
Python Libraries:
- Create a Conda environment and install the required libraries from requirements.txt:
```
pip install -r requirements.txt
```

Running the program

Start the Producer:
```
python confluentKafkaProducer
```
Start the Spark Consumer:

Source the bash profile:
```
source ~/.bash_profile
```

Run Spark with the following command:

spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1 sparkConsumer.py

Start the Kafka Consumer:
```
python kafkaConsumer.py
```

Useful Tips

To ensure Spark can access Conda environment libraries, set these environment variables:
```
export PYSPARK_PYTHON=$(which python)
export PYSPARK_DRIVER_PYTHON=$(which python)
```

To list running Kafka topics:

bin/kafka-topics.sh --list --bootstrap-server localhost:PORT

To delete a Kafka topic:

kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic your_topic_name

Some suggestions:
- Here, I have used only two brokers with replication factor of 2, you can update it as per the requirements.
- I have taken only one partition each topic, you can update it for faster processing.
- You can use Kafka streaming API instead of Spark for processing frames.
- You can work on tracking objects across the frames. The basic code is there in repo.
- I am using .csv file to read camera metadata. You can use other databases for storing camera details.

Related Projects

e2e-data-engineering

An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage...

06 Sep 2023 179

Credit-Card-Fraud-Detection-Spark

twitterStreamingSparkKafkaDemo

a demo project to Analyze most popular twitter hashtags using Java 8 Spring-Boot Spark Streaming ...

Apache-Spark-Structured-Streaming-Via-Docker-Compose

DataStreamingETL

Utilizing my background and love for Apache Airflow and Data to build a real-time data streaming ...

e2e-structured-streaming

End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to sche...

End-to-end-realtime-data-streaming

An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage...

StreamlineDE-

Welcome to StreamlineDE, an end-to-end data engineering project designed to demonstrate real-time...

real-time-data-processing

Real-time financial data processing using Apache Kafka, Spark, MySQL, and Grafana, orchestrated w...

bigdata-playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Stre...

12 Dec 2017 208

stream-applications

A repository contains some examples for stream processing applications using spark structured str...

football-big-data

This is a comprehensive solution for real-time football analytics, leveraging Apache Spark execut...

Spark-Structured-Streaming-Examples

Spark Structured Streaming / Kafka / Cassandra / Elastic

15 Jun 2017 184

hdfs-stream-processing

Streaming data processing using Hadoop HDFS, Spark, Kafka, Minio, Elasticsearch

awesome-kafka

A list about Apache Kafka

29 Apr 2016 565