Galaxy Classification is a machine learning project focused on classifying galaxies into two subclasses: 'STARFORMING' and 'STARBURST'. This project demonstrates data preprocessing, model training, and evaluation using advanced machine learning techniques and Python libraries.
MIT License
This project uses machine learning to classify galaxies from the SDSS dataset into 'STARFORMING' and 'STARBURST' categories, enabling real-time predictions through a web application.
The Sloan Digital Sky Survey (SDSS) has scanned about one-third of the sky and identified around 1 billion objects, including nearly 3 million galaxies. This project utilizes a dataset with 100,000 rows of photometric image data, focusing on classifying galaxies into two subclasses: 'STARFORMING' and 'STARBURST'. The dataset is available for download from Kaggle here.
The Galaxy Classification Project aims to predict galaxy subclasses using machine learning techniques. This project encompasses data collection, preparation, exploratory data analysis (EDA), model building, performance testing, and deployment. The final model is integrated with a web framework to provide real-time predictions based on user input.
Data Collection & Preparation
Exploratory Data Analysis (EDA)
Model Building
Performance Testing
Model Deployment
The Galaxy Classification Project follows a modular architecture:
Collect the Dataset
Data Preparation
Descriptive Statistics
describe()
function to summarize statistics such as mean, standard deviation, and percentiles.Visual Analysis
Training the Model
Train various machine learning algorithms to identify the best model:
Testing the Model
Evaluation Metrics
Model Evaluation
After training and testing various models, the performance metrics for each model are compared. The table below summarizes the precision, recall, F1-score, and accuracy of each model:
Model | Precision Class 0 | Precision Class 1 | Recall Class 0 | Recall Class 1 | f1-score Class 0 | f1-score Class 1 | Accuracy |
---|---|---|---|---|---|---|---|
Decision Tree Classifier | 0.78 | 0.80 | 0.80 | 0.78 | 0.79 | 0.79 | 0.79168 |
Logistic Regression | 0.79 | 0.81 | 0.82 | 0.78 | 0.80 | 0.80 | 0.80165 |
Random Forest Classifier | 0.84 | 0.81 | 0.82 | 0.84 | 0.83 | 0.82 | 0.82575 |
Performance Analysis
Decision Tree Classifier:
Logistic Regression:
Random Forest Classifier:
Hyperparameter Tuning
Optimize the model performance by adjusting hyperparameters to improve accuracy further.
Save the Best Model
RF.pkl
- Save the best-performing Random Forest model using Pythonโs pickle
module. This avoids retraining the model and allows for future use.Integrate with Web Framework
index.html
- The main page where users input their data.inner-page.html
- The results page displaying predictions.Clone the Repository:
git clone https://github.com/atul-maurya-30/galaxy.git
cd galaxy
Install Required Packages:
pip install -r requirements.txt
Run the Flask Application:
python app.py
Access the Web Application: Open a web browser and go to http://127.0.0.1:5000.
Enter Galaxy Data: Use the form on index.html
to input galaxy data.
View Predictions: After submitting the input data, view predictions on inner-page.html
.
This project is licensed under the MIT License. See the LICENSE file for details.
You can access the deployed application here: Galaxy Classifier
Feel free to try out the app and explore its features.
I have earned various Google Cloud badges showcasing my skills and expertise. You can view them here:
Thank you for exploring the Galaxy Classification Project! We hope this project provides valuable insights into galaxy classification using machine learning. For any questions or contributions, please feel free to reach out.