🎥 Surveillance Video Summarizer: AI-Powered Video Analysis and Summarization

Checked on 13.09.2024 ✅ (This project is developed tested on the Lightning AI platform, running on an L40 GPU)

Surveillance Video Summarizer is a AI-driven system that processes surveillance videos, extracts key frames, and generates detailed annotations. Powered by a fine-tuned Florence-2 Vision-Language Model (VLM) specifically trained on the SPHAR dataset, it highlights notable events, actions, and objects within video footage and logs them for easy review and further analysis.

The fine-tuned model can be found at: kndrvitja/florence-SPHAR-finetune-2.

See the tool in action below!

🎥 Demo Video

Features

AI-Powered Video Summarization Automatically extract frames from surveillance videos and generate annotations that capture actions, interactions, objects, and unusual events. The annotations are stored in a SQLite database for easy retrieval.
Real-Time Frame Processing By utilizing asynchronous threading, the system processes video frames efficiently, allowing real-time analysis while minimizing performance bottlenecks. It logs every second, ensuring easy debugging and verification.
Fine-Tuned Florence-2 VLM for SPHAR Dataset The summarization process is powered by a fine-tuned Florence-2 VLM, specifically trained on the SPHAR dataset. This model is optimized to detect and describe surveillance-specific events with higher accuracy.
Gradio-Powered Interactive Interface Interact with the surveillance logs through a Gradio-based web interface. You can specify time ranges, and the system will retrieve, summarize, and analyze the annotated logs, providing detailed insights into the video footage over the selected period using the OpenAI API. This functionality can be extended to leverage advanced models like Gemini, enabling more efficient handling of longer context videos and delivering more comprehensive video summarization over extended timeframes.

📣 How it Works

Frame Extraction: Frames are extracted at regular intervals from surveillance video files using OpenCV.
AI-Powered Annotation: Each frame is analyzed by the fine-tuned Florence-2 Vision-Language Model, generating insightful annotations about the scene.
Data Storage: Annotations and their associated frame data are stored in a SQLite database, ready for future analysis.
Gradio Interface: The system allows users to effortlessly query surveillance logs by providing a specific time range and tailored prompts. It retrieves, summarizes, and analyzes the relevant video footage, offering concise insights

Installation

Clone the repository:

git clone https://github.com/Ravi-Teja-konda/Surveillance_Video_Summarizer.git

Navigate to the project directory:

cd Surveillance_Video_Summarizer

Install the required Python libraries:

pip install -r requirements.txt

Configuration

Model and Processor

The system utilizes the Florence-2 Vision-Language Model fine-tuned for the SPHAR dataset. The fine-tuned model can be found at kndrvitja/florence-SPHAR-finetune-2.
Ensure you have your OpenAI API key stored in a .env file as required.

Database Path

The default SQLite database for storing frame data is located at /teamspace/studios/Florence_2_video_analytics/Florence_2_video_analytics.db. You can modify this path.

Usage

Firstly, run the frame extraction :

python surveillance_video_summarizer.py

Next, interact with the Gradio interface for log analysis:

python surveillance_log_analyzer_with_gradio.py

From here, you can use the Gradio interface to query specific periods of video footage and retrieve annotated summaries based on your input. You can query the system for specific actions, notable events, or general activity summaries. Provide the time range and your query prompt, and the system will return the relevant logs

🚀 Future Enhancements

Advanced Event Detection

We plan to enhance the model’s capability to detect more complex events such as traffic violations, suspicious behavior, and other nuanced surveillance scenarios by training florence-2 with more data

Real-Time Streaming

In future will plan to support real-time video streams for immediate frame extraction and analysis as the video is being captured.

Contributing

Contributions are welcome! Feel free to submit a pull request.

❤️ Support the Project

If you find this project useful, consider starring it on GitHub to help others discover it!

📚 References

Inspired by advances in Vision-Language models like Florence-2.

License

This project is licensed under the Apache License 2.0.

Badges

Extracted from project README's

Related Projects

Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs ...

19 Apr 2023 2,735

NExT-GPT

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

30 Aug 2023 3,223

VLog

Transform Video as a Document with ChatGPT, CLIP, BLIP2, GRIT, Whisper, LangChain.

20 Apr 2023 528

InternLM-XComposer

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form...

26 Sep 2023 1,680

caption-by-committee

Using LLMs and pre-trained caption models for super-human performance on image captioning.

14 Dec 2022 27

h2o-llmstudio

H2O LLM Studio - a framework and no-code GUI for fine-tuning LLMs. Documentation: https://h2oai.g...

17 Apr 2023 3,567

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and b...

17 Apr 2023 19,659

evadb

Database system for AI-powered apps

10 Sep 2018 2,589

MoneyPrinterTurbo

利用AI大模型，一键生成高清短视频 Generate short videos with one click using AI LLM.

11 Mar 2024 8,976

entaoai

Chat and Ask on your own data. Accelerator to quickly upload your own enterprise data and use Op...

16 Mar 2023 826

awesome-openai-vision-api-experiments

Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥

07 Nov 2023 1,584

ChatTTS

A generative speech model for daily dialogue.

27 May 2024 31,328

ShareGPT4Video

[NeurIPS 2024 D&B Track] An official implementation of ShareGPT4Video: Improving Video Understand...

06 Jun 2024 1,232

h2ogpt

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports...

24 Mar 2023 10,805

Caption-Anything

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT...

07 Apr 2023 1,665