Fraud Warden
Overview
Fraud Warden is a next-generation credit card fraud detection system that uses machine learning to predict whether a transaction is fraudulent or not. The system leverages a Random Forest Classifier to make predictions based on various features of the transaction.
Technology Stack
Programming Language: Python
Libraries:
-
streamlit for building the web application
-
pandas for data manipulation
-
plotly.express and seaborn for data visualization
-
scikit-learn for machine learning
-
pickle for model serialization
Installation Instructions
- Clone the Repository:
- git clone https://github.com/yourusername/fraud-warden.git
- cd fraud-warden
- Create a Virtual Environment:
- python -m venv venv
- source venv/bin/activate # On Windows use
venv\Scripts\activate
- Install Dependencies:
- pip install -r requirements.txt
- Run the Application:
How It Works
- Data Preprocessing:
- The application preprocesses the uploaded CSV file by removing unnecessary columns and converting date columns to datetime objects.
Additional features such as time_of_day and age are derived from existing columns.
- Feature Engineering:
- Categorical features are encoded into numerical values.
The data is reindexed to ensure all required columns are present.
- Oversampling:
- The application uses Synthetic Minority Over-sampling Technique (SMOTE) to balance the dataset.
- Model Prediction:
- The preprocessed data is fed into a pre-trained Random Forest Classifier model.
The model predicts whether a transaction is fraudulent based on the input features.
- Visualization:
- The application provides various visualizations such as histograms, bar charts, and correlation heatmaps to help users understand the data.
Features
-
Upload CSV: Users can upload a CSV file containing transaction data.
-
Data Preview: Displays a preview of the uploaded data.
-
Basic Statistics: Shows basic statistics of the dataset.
-
Data Types: Displays the data types of each column.
-
Missing Values: Shows the count of missing values in each column.
-
Distribution of Numerical Columns: Visualizes the distribution of numerical columns.
-
Counts of Categorical Columns: Visualizes the counts of categorical columns.
-
Correlation Heatmap: Displays a heatmap of the correlation between numerical features.
-
SMOTE Sampling: Balances the dataset using SMOTE sampling.
-
Fraud Prediction: Predicts whether a transaction is fraudulent based on user input.
Resources Used