fraud-detector

Project FraudCatch leverages AI to predict and prevent financial fraud in real-time. It uses Apache Kafka for data streaming, Apache Spark for distributed processing, and differentially private machine learning models for security. Our Streamlit application provides businesses with real-time fraud analysis, notifications, and preventive measures.

MIT License

Stars

1

Committers

View Code on GitHub

Ecosystems: Streamlit, Apache Spark, Python, Amazon Web Services

Project Fraudcatch `alpha` 😊

Info Edge Ventures AI Hackathon 2024 - Team Scamslayers💀💀✨

Team Details

Team Name: Scamslayers💀💀✨
Team Members: Sampriti Mitra and Soumyadeep Bose

Problem Statement

Use of AI to predict and prevent financial frauds.

Project Overview

We have created a production-level fraud detection and prevention system for financial transactions for the InfoEdge Ventures AI Hackathon 2024. Our system is built using a data pipeline, distributed model inference, differential privacy, and continual incremental learning.

Solution Components

Data Ingestion:
- Real-time data streaming using Kafka from the company’s API.
Model Inference:
- Distributed model inference using Apache Spark to predict fraudulent transactions.
- Ensemble of several models to increase accuracy.
Data Storage:
- Storing the data in an S3 bucket upon successful inference using differential privacy.
Analysis and Dashboard:
- Analyzing stored data and displaying results on a Streamlit dashboard.
User Interface:
- A Streamlit website for businesses to sign up, connect their API, access fraud data analysis, and receive fraud prevention measures.
- Real-time notifications for fraudulent transactions.
Security and Privacy:
- Using differential privacy techniques for data protection and equitable analysis.
- Reporting false positives to continually train the models.

Key Features

Real-Time Anomaly Detection: Immediate detection and response to fraudulent activities using Kafka.
Distributed Data Processing: Using Spark for distributed processing and inference.
Streamlit Application: Users can upload data, connect their API, and access an analysis dashboard.
Real-Time Notifications: Email alerts to security teams upon detection of fraud.
Scalability and Flexibility: The system is designed to handle large volumes of data and adapt to various fraud detection needs.
Differentially Private Models: Ensuring user data remains confidential while maintaining high accuracy in fraud detection.

Technologies Used

Data Streaming: Confluent Kafka, Confluent Zookeeper
Data Processing: Bitnami Spark Master, Bitnami Spark Worker
Storage: AWS S3
Machine Learning: Tensorflow, Pytorch, Scikit-learn, Diffprivlib
Web Framework: Streamlit
Programming Languages: Python
Other Tools: Mlflow, Docker, Pandas, Matplotlib

Performance Metrics

Ensembled Models Accuracy: 94% accuracy on the test set.
Average Spark Processing Delay: Less than 1 second.
Kafka Latency: 50-80 milliseconds.
Data Retrieval Time: 100 milliseconds per record.

Future Enhancements

Implementing homomorphic encryption or secure multi-party computation.
Integrating blockchain technology for logging transactions and model updates.
Extending fraud detection capabilities beyond credit card fraud.
Adding user activity tracking with Google Firebase.
Deploying the system through AWS ECR for scalability.
Developing explainability features for real-time fraud detection insights.

GitHub & Demo Video

GitHub Repository: fraud-detector
Demo Video: Watch Demo

Project Architecture

This project was developed for the Info Edge Ventures AI Hackathon 2024 by Team Scamslayers.

Related Projects

pyspark-realtime-streaming-sentiment-analysis

⏱ Real-Time Sentiment Analysis using PySpark and simulation of Twitter/X API using FastAPI

Bank-Credit-Risk-Model-using-Machine-Learning

Robust credit risk model that go beyond traditional credit scoring methods in banks

anomaly-detection-in-time-series-based-on-statistical-features-and-forcasting

Detects anomalies in time series using statistical features and forecasts future values with an L...

Big-Data-Project

This project aims to predict smartphone prices using a combination of batch and stream processing...

credit_risk_assessment_system_simulation

A data analytist solution designed to enhance credit risk assessment for financial institutions a...

Credit_Card_Fraud_Detection_ML_Django_Project

Machine learning projects through Django

Fraud-Warden

Fraudulent Credit Transaction detection system using SMOTE, Random Forest Classifier and Streamlit

Credit_Card_Fraud_Detection

Deploying Classification Model on GCP for detecting anomalous behavior in credit-card transaction...

Credit-Card-Fraud-Detection-Spark

streaming-orders

Project to stream real-time orders and apply some ETL pipelines & analytics using DataBricks, Kaf...

Complete-Data-Science-Roadmap

Complete Roadmap For Data Science

Online_Payment_Fraud_Detection

🔒💳 𝗢𝗻𝗹𝗶𝗻𝗲 𝗣𝗮𝘆𝗺𝗲𝗻𝘁 𝗙𝗿𝗮𝘂𝗱 𝗗𝗲𝘁𝗲𝗰𝘁𝗶𝗼𝗻 is a data science project that uses machine learning to identif...