Open Source Ecosystems

Overview

The project is made to make real-time credit risk assessment based on the (bank) client profiles it is given by leveraging big data processing and machine learning, simulating how a bank might evaluate and manage credit risk for its clients.

Simulated Output Example

How It Works

Data Ingestion: The system simulates the ingestion of financial data including credit scores, account balances, and transaction histories for thousands of clients.
Data Processing: Leveraging PySpark's distributed computing capabilities, the raw data is cleaned, transformed, and prepared for analysis at scale.
Machine Learning Model:
- A Random Forest Regressor is trained on historical data to predict credit risk scores.
- The model considers multiple factors including credit score, account balance, and transaction frequency.
Risk Scoring:
- Each client's risk is calculated using an algorithm that normalizes and weights various financial factors.
- The calculated risk score is then used to categorize clients into risk categories ranging from "Very Low Risk" to "Very High Risk".
Dynamic Rate Calculation:
- Based on the risk assessment, the system dynamically calculates personalized savings APY and lending rates for each client.
- This ensures competitive rates while managing the bank's overall risk exposure.
Real-time Assessment: As new financial data comes in, the system can rapidly reassess a client's credit risk, allowing for up-to-date risk management.
Risk Dashboard: A comprehensive dashboard provides bank managers with key metrics including average credit scores, risk distributions, and potential high-risk clients.

Installation

Clone the repository:

git clone https://github.com/dvelkow/credit_risk_data_lake_for_lending

Install the required packages:

pip install -r requirements.txt

Run the main data lake:

python main.py

(It would run with random/mock data, but you can easily connect it to a real database through the main.py file)

Related Projects

spark-py-notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython /...

06 May 2015 1,614

credit-risk-modelling

Credit Risk analysis by using Python and ML

17 Nov 2017 147

real-time-data-processing

Real-time financial data processing using Apache Kafka, Spark, MySQL, and Grafana, orchestrated w...

10 Mar 2024 0

Bank-Credit-Risk-Model-using-Machine-Learning

Robust credit risk model that go beyond traditional credit scoring methods in banks

23 Jun 2024 0

fraud-detector

Project FraudCatch leverages AI to predict and prevent financial fraud in real-time. It uses Apac...

21 Jul 2024 1

pyspark-realtime-streaming-sentiment-analysis

⏱ Real-Time Sentiment Analysis using PySpark and simulation of Twitter/X API using FastAPI

13 Jun 2024 2

Credit-Card-Fraud-Detection-Spark

05 May 2024 0

APACHE-SPARK-PYSPARK-DATABRICKS-MACHINE-LEARNING-MLIB

Apache Spark Machine Learning project using MLlib and Linear Regression on Databricks!

07 Aug 2024 0

pyspark-maestro

This repo contains implementations of PySpark for real-world use cases for batch data processing,...

23 Jul 2024 1

credit_risk_assessment_system_simulation