Predicting Student Performance: An Analytical Approach
This repository contains the code and analysis for predicting student performance based on various factors using multiple machine learning algorithms.
The objective of this project is to predict students' final grades (G3) based on their demographic information, family background, educational support, extracurricular activities, and personal attributes.
The dataset includes information about students such as:
- Demographics: school, sex, age, address
- Family Background: famsize, Pstatus, Medu, Fedu, Mjob, Fjob, guardian
- Educational Support: traveltime, studytime, failures, schoolsup, famsup, paid, activities, nursery, higher, internet, romantic
- Personal Information: famrel, freetime, goout, Dalc, Walc, health, absences
- Grades: G1, G2, G3
The EDA involves:
- Visualizing the distribution of numerical features
- Visualizing categorical features
- Analyzing feature relationships with the target variable (G3)
We evaluated several machine learning algorithms to find the best model for predicting student performance:
- Logistic Regression
- Decision Tree Classifier
- Support Vector Machine
- Random Forest Classifier
- AdaBoost Classifier
- Gradient Boosting Classifier
- K Neighbors Classifier
- Gaussian Naive Bayes
This class is designed to:
- Load the dataset and initialize models
- Preprocess data by encoding, splitting into train/test, and standardizing features
- Train various classifiers and evaluate their metrics
- Compare model performance
- Plot feature importances for selected models
Each model was trained on the training data and evaluated on the testing data. Evaluation metrics include accuracy, precision, recall, and F1 score.
Model Performance
-
Gradient Boosting Classifier: Highest accuracy (0.515385)
-
Random Forest Classifier: Moderate accuracy (0.446154)
-
Decision Tree Classifier: Lower accuracy (0.384615)
-
Other Models: Showed lower performance
Feature Importance
-
Gradient Boosting Classifier: Top features are G2, G1, absences, age, and free time
-
Random Forest and Decision Tree Classifiers: Similar top features, emphasizing G1, G2, absences, free time, and parental education
-
Importance of Continuous Assessment:
- G1 and G2 are crucial for predicting G3.
- Implement regular assessments and feedback.
-
Attendance and Engagement:
- Absences and free time impact performance.
- Improve attendance and participation programs.
-
Family Background:
- Parental education affects performance.
- Engage families and provide support for equal opportunities.
-
Holistic Approach:
- Address academic, social, and emotional needs for student success.