This repository contains the implementation of the CPF Clustering Algorithm in two versions:
The dataset used for this implementation is the "Turkey Student Evaluation Data Set" available at turkiye-student-evaluation_generic.csv
.
Clone the repository:
git clone https://github.com/your-username/CPF-Clustering.git
cd CPF-Clustering
Install the required Python packages:
pip install pandas numpy scikit-learn matplotlib seaborn scipy faiss-cpu
Ensure the dataset turkiye-student-evaluation_generic.csv
is in the same directory as the script.
Run the script:
python Existing-CPF.py
The script will:
Ensure the dataset turkiye-student-evaluation_generic.csv
is in the same directory as the script.
Run the script:
python Optimized-CPF.py
The script will:
This script implements the original CPF clustering algorithm. It consists of three main steps:
This script implements an optimized version of the CPF clustering algorithm using FAISS. FAISS is used to speed up the nearest neighbor search, improving the performance of the algorithm. It follows the same steps as the original implementation but with FAISS for nearest neighbor search.
The Optimized-CPF.py
script provides significant performance improvements over the Existing-CPF.py
script due to the use of FAISS. The table below compares the performance metrics of both versions:
Method | Time Taken (seconds) | Number of Clusters | ARI | AMI | Silhouette Score | Davies-Bouldin Index |
---|---|---|---|---|---|---|
Original CPF | 0.78 | 357 | -0.000744 | -0.001803 | -0.209047 | 1.470275 |
Optimized CPF | 0.13 | 4 | 0.000000 | 0.000000 | 0.197073 | 0.570039 |
This project is licensed under the MIT License - see the LICENSE file for details.