IMPLEMENTATION-OF-THE-COMPONENT-WISE-PEAK-FINDING-ALGORITHM

Overview

This repository contains the implementation of the CPF Clustering Algorithm in two versions:

Existing-CPF.py: The original implementation of the CPF clustering algorithm.
Optimized-CPF.py: An optimized version of the CPF clustering algorithm using FAISS for improved performance.

Dataset

The dataset used for this implementation is the "Turkey Student Evaluation Data Set" available at turkiye-student-evaluation_generic.csv.

Requirements

Python 3.x
Required Python packages:
- pandas
- numpy
- scikit-learn
- matplotlib
- seaborn
- scipy
- faiss

Installation

Clone the repository:

git clone https://github.com/your-username/CPF-Clustering.git
cd CPF-Clustering

Install the required Python packages:

pip install pandas numpy scikit-learn matplotlib seaborn scipy faiss-cpu

Instructions to Run the Code

Running Existing-CPF.py

Ensure the dataset turkiye-student-evaluation_generic.csv is in the same directory as the script.
Run the script:
```
python Existing-CPF.py
```
The script will:
- Load and preprocess the dataset.
- Perform CPF clustering.
- Print clustering metrics.
- Plot PCA results.
- Print a comparison of the clustering methods.

Running Optimized-CPF.py

Ensure the dataset turkiye-student-evaluation_generic.csv is in the same directory as the script.
Run the script:
```
python Optimized-CPF.py
```
The script will:
- Load and preprocess the dataset.
- Perform CPF clustering using FAISS for optimized performance.
- Print clustering metrics.
- Plot PCA results.
- Print a comparison of the clustering methods.

Code Explanation

Existing-CPF.py

This script implements the original CPF clustering algorithm. It consists of three main steps:

Build CCGraph: Constructs the mutual k-nearest neighbor graph.
Get Density Dists BB: Computes density distances and big brothers using the mutual k-nearest neighbor graph.
Get Y: Performs clustering based on density distances and big brothers.

Optimized-CPF.py

This script implements an optimized version of the CPF clustering algorithm using FAISS. FAISS is used to speed up the nearest neighbor search, improving the performance of the algorithm. It follows the same steps as the original implementation but with FAISS for nearest neighbor search.

Performance Comparison

The Optimized-CPF.py script provides significant performance improvements over the Existing-CPF.py script due to the use of FAISS. The table below compares the performance metrics of both versions:

Method	Time Taken (seconds)	Number of Clusters	ARI	AMI	Silhouette Score	Davies-Bouldin Index
Original CPF	0.78	357	-0.000744	-0.001803	-0.209047	1.470275
Optimized CPF	0.13	4	0.000000	0.000000	0.197073	0.570039

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

The Turkey Student Evaluation Data Set is used in this project.
FAISS (Facebook AI Similarity Search) library is used for optimizing the CPF clustering algorithm.

Related Projects

ml-cheatsheet

A constantly updated python machine learning cheatsheet

11 Apr 2017 166

numpy-sift

Numpy implementation of SIFT descriptor

21 Feb 2017 38

Machine-Learning-Using-Python

This repo contains the files created for machine learning using python

05 Feb 2020 12

MachineLearning

机器学习教程，本教程包含基于numpy、sklearn与tensorflow机器学习，也会包含利用spark、flink加快模型训练等用法。本着能够较全的引导读者入门机器学习。

06 Sep 2020 57

Human-detection-and-Tracking

05 May 2016 852

Higgs-Dataset-Training

Training Higgs Dataset with Keras - https://doi.org/10.5281/zenodo.13133945

26 Jul 2024 0

Machine-Learning-with-Python

Practice and tutorial-style notebooks covering wide variety of machine learning techniques

17 Jul 2017 3,075

libcpab

CPAB Transformations: finite-dimensional spaces of simple, fast, and highly-expressive diffeomorp...

07 Aug 2018 46

python-signal-processing

splearn: package for signal processing and machine learning with Python. Contains tutorials on un...

30 Nov 2020 61

Algorithmic-Trading

I have been deeply interested in algorithmic trading and systematic trading algorithms. This Repo...

01 Jul 2020 61

Face-Recognition-Attendance-System

Face Detection | Recognition | Attendance

28 Jun 2019 478

text-based-search-engine

Implementation of a search engine using TF-IDF and Word Embedding-based vectorization techniques ...

12 Aug 2024 0

tsb-kit

Univariate Time-Series Anomaly Detection algorithms from TSB-UAD

08 Jun 2024 5

pyik

Python Instrument Kit: A collection of tools for data analysis and plotting

29 Aug 2017 9

pfapack

Efficient numerical computation of the Pfaffian for dense and banded skew-symmetric matrices

02 Mar 2020 15