
Fraudulent Credit Transaction detection system using SMOTE, Random Forest Classifier and Streamlit


Fraud Warden is a next-generation credit card fraud detection system that uses machine learning to predict whether a transaction is fraudulent or not. The system leverages a Random Forest Classifier to make predictions based on various features of the transaction. Technology Stack Programming Language: Python


  • streamlit for building the web application
  • pandas for data manipulation
  • and seaborn for data visualization
  • scikit-learn for machine learning
  • pickle for model serialization

Installation Instructions

  • Clone the Repository:
  • git clone
  • cd fraud-warden
  • Create a Virtual Environment:
    • python -m venv venv
    • source venv/bin/activate # On Windows use venv\Scripts\activate
  • Install Dependencies:
    • pip install -r requirements.txt
  • Run the Application:
    • streamlit run

How It Works

  • Data Preprocessing:
    • The application preprocesses the uploaded CSV file by removing unnecessary columns and converting date columns to datetime objects.
      Additional features such as time_of_day and age are derived from existing columns.
  • Feature Engineering:
    • Categorical features are encoded into numerical values.
      The data is reindexed to ensure all required columns are present.
  • Oversampling:
    • The application uses Synthetic Minority Over-sampling Technique (SMOTE) to balance the dataset.
  • Model Prediction:
    • The preprocessed data is fed into a pre-trained Random Forest Classifier model.
      The model predicts whether a transaction is fraudulent based on the input features.
  • Visualization:
    • The application provides various visualizations such as histograms, bar charts, and correlation heatmaps to help users understand the data.


  • Upload CSV: Users can upload a CSV file containing transaction data.
  • Data Preview: Displays a preview of the uploaded data.
  • Basic Statistics: Shows basic statistics of the dataset.
  • Data Types: Displays the data types of each column.
  • Missing Values: Shows the count of missing values in each column.
  • Distribution of Numerical Columns: Visualizes the distribution of numerical columns.
  • Counts of Categorical Columns: Visualizes the counts of categorical columns.
  • Correlation Heatmap: Displays a heatmap of the correlation between numerical features.
  • SMOTE Sampling: Balances the dataset using SMOTE sampling.
  • Fraud Prediction: Predicts whether a transaction is fraudulent based on user input.

