Fraud Warden

Overview

Fraud Warden is a next-generation credit card fraud detection system that uses machine learning to predict whether a transaction is fraudulent or not. The system leverages a Random Forest Classifier to make predictions based on various features of the transaction. Technology Stack Programming Language: Python

Libraries:

streamlit for building the web application
pandas for data manipulation
plotly.express and seaborn for data visualization
scikit-learn for machine learning
pickle for model serialization

Installation Instructions

Clone the Repository:
git clone https://github.com/yourusername/fraud-warden.git
cd fraud-warden
Create a Virtual Environment:
- python -m venv venv
- source venv/bin/activate # On Windows use venv\Scripts\activate
Install Dependencies:
- pip install -r requirements.txt
Run the Application:
- streamlit run app.py

How It Works

Data Preprocessing:
- The application preprocesses the uploaded CSV file by removing unnecessary columns and converting date columns to datetime objects.
  Additional features such as time_of_day and age are derived from existing columns.
Feature Engineering:
- Categorical features are encoded into numerical values.
  The data is reindexed to ensure all required columns are present.
Oversampling:
- The application uses Synthetic Minority Over-sampling Technique (SMOTE) to balance the dataset.
Model Prediction:
- The preprocessed data is fed into a pre-trained Random Forest Classifier model.
  The model predicts whether a transaction is fraudulent based on the input features.
Visualization:
- The application provides various visualizations such as histograms, bar charts, and correlation heatmaps to help users understand the data.

Features

Upload CSV: Users can upload a CSV file containing transaction data.
Data Preview: Displays a preview of the uploaded data.
Basic Statistics: Shows basic statistics of the dataset.
Data Types: Displays the data types of each column.
Missing Values: Shows the count of missing values in each column.
Distribution of Numerical Columns: Visualizes the distribution of numerical columns.
Counts of Categorical Columns: Visualizes the counts of categorical columns.
Correlation Heatmap: Displays a heatmap of the correlation between numerical features.
SMOTE Sampling: Balances the dataset using SMOTE sampling.
Fraud Prediction: Predicts whether a transaction is fraudulent based on user input.