This repo was created for an intermediate project in the MLOPS course of Naya College. The goal of the project is to establish an abandonment prediction mechanism, with the possibility of API and batch prediction. The project has a CI-CD procedure, monitoring, a grafana dashboard, and more.
MIT License
This project implements a machine learning model for customer churn prediction, utilizing FastAPI and Apache Beam. It includes a comprehensive MLOps pipeline with monitoring, batch processing, and CI/CD integration.
The presenation of this project can be found here: Gamma Link
data/raw
or database queries (fetching only unpredicted data based on the timestamp of the last prediction)beam_preprocessing.py
)data/batch_results
folder and/or databasegraph TD
A[<b>API</b>] -->|Saves predictions| B[(<b>PostgreSQL DB</b>)]
A -->|Monitored by| C[<b>Prometheus</b>]
C -->|Visualized in| D[<b>Grafana</b>]
A -.->|Logs stored in| B
B -->|Daily API logs| E[<b>WhyLabs</b>]
F[<b>Batch Processing</b>] <-->|Reads/Writes| B
F <-->|Processes files| G[<b>Target Folder</b>]
F -->|Daily logs & predictions| E
H[<b>User</b>] -->|API requests| A
I[<b>Cron Job</b>] -->|Triggers daily| F
J[<b>MLOps Engineer</b>] -->|Views| D
D -->|Alerts| J
classDef primary fill:#e6f3ff,stroke:#333,stroke-width:2px;
classDef secondary fill:#d0e0e3,stroke:#333,stroke-width:2px;
classDef tertiary fill:#fff2cc,stroke:#333,stroke-width:2px;
classDef quaternary fill:#f2e6ff,stroke:#333,stroke-width:2px;
classDef default color:#000000;
class A,F primary;
class B,G secondary;
class C,D,E tertiary;
class H,I,J quaternary;
graph TD
A[api] -->|depends on| B[db]
A -->|connects to| C[prometheus]
D[batch] -->|depends on| B
C -->|visualized by| E[grafana]
F[network]
A -->|part of| F
B -->|part of| F
C -->|part of| F
D -->|part of| F
E -->|part of| F
subgraph Volumes
G[postgres_data]
H[grafana_data]
end
B -->|uses| G
E -->|uses| H
classDef default fill:#f9f9f9,stroke:#333,stroke-width:2px;
classDef service fill:#AED6F1,stroke:#3498DB,stroke-width:2px;
classDef db fill:#F9E79F,stroke:#F4D03F,stroke-width:2px;
classDef monitoring fill:#D5F5E3,stroke:#2ECC71,stroke-width:2px;
classDef network fill:#FADBD8,stroke:#E74C3C,stroke-width:2px;
classDef volume fill:#8E44AD,stroke:#4A235A,stroke-width:2px,color:#FFFFFF;
class A,D service;
class B db;
class C,E monitoring;
class F network;
class G,H volume;
graph TD
A[Push to main branch<br>or Pull Request] --> B[Check out code]
B --> C[Set up Python 3.10.12]
C --> D[Install dependencies]
D --> E[Run tests]
E --> F[Build Docker image]
F --> G{Is it a push<br>to main?}
G -->|Yes| H[Log in to Docker Hub]
H --> I[Push image to Docker Hub]
G -->|No| J[End]
I --> J
classDef default fill:#f9f9f9,stroke:#333,stroke-width:2px
classDef trigger fill:#ff9999,stroke:#333,stroke-width:2px
classDef step fill:#99ccff,stroke:#333,stroke-width:2px
classDef decision fill:#ffcc99,stroke:#333,stroke-width:2px
classDef endClass fill:#ccff99,stroke:#333,stroke-width:2px
class A trigger
class B,C,D,E,F,H,I step
class G decision
class J endClass
create '.env' file int the docker directory, based on the example.env.txt file
Navigate to the docker directory:
cd docker
Build the Docker images:
docker-compose build
Start the services:
docker-compose up
Access the API documentation at http://localhost:8005/docs
Use the /predict/
POST endpoint with the following example body:
{
"TotalCharges": "1889.5",
"Contract": "One year",
"PhoneService": "Yes",
"tenure": 34
}
Expected response:
{
"prediction": 0
}
(Indicates the client is not likely to churn soon)
The batch processing pipeline utilizes Apache Beam for efficient data processing. It runs daily at 12 PM, performing the following steps:
Configure the batch job settings in the 'config' file.
The FastAPI application provides real-time predictions for the marketing server. It uses the same preprocessing steps and model as the batch process to ensure consistency.
Prometheus is used to collect metrics from both the API and the batch processing pipeline. Key metrics include:
Alerts are configured in Grafana to notify of any anomalies or issues in the system.
PostgreSQL is used for storing predictions and raw data.
Whylogs is implemented for data drift detection. Monitor metrics through the Grafana dashboard or custom reports generated in the Whylabs website.
The project includes a full CI/CD pipeline configured with GitHub Actions. View the workflow files in the .github/workflows/
directory.
To modify input parameters or other configurations, please refer to the configuration files in the config/
directory.
This project is licensed under the MIT License - see the LICENSE.md file for details.
For more information or support, please open an issue in the GitHub repository.