Using Machine Learning to Predict and Prevent Academic Failure Before It Happens.
This project presents an AI-powered Early Warning System designed to identify students at risk of academic failure using machine learning models. By analyzing a range of academicbehavioraland social featuresthe system aims to predict whether a student will pass or fail — enabling timely interventions by educators.
To build a predictive system using supervised machine learning models that:
- Forecasts student academic success based on historical data
- Identifies key features contributing to student performance
- Enables early intervention to prevent failure
- Source: UCI Machine Learning Repository - Student Performance Dataset
- Focus:
student-mat.csv(Math subject data only) - Size: 395 records33 features
G3- Final grade- Transformed into binary
pass(1 if G3 >= 10 else 0)
- Demographic:
sexageaddressfamsizeetc. - Academic:
studytimefailuresabsencesetc. - Social:
romanticgoouthealthetc.
- No missing values found in the dataset.
- Used Label Encoding on categorical columns such as
schoolsexaddressetc.
- Applied StandardScaler to all numeric features except the target to normalize distributions and improve model performance.
- Dropped intermediate grade columns
G1andG2which were highly correlated withG3to ensure realistic evaluation.
- Visualized distribution of the
passlabel. - Used heatmaps to inspect feature correlations.
- Found high correlation between past grades and final result.
- Observed socio-academic behavior patterns in successful students.
Used GridSearchCV for hyperparameter tuning and selected models based on accuracy and performance metrics:
- Logistic Regression
- Support Vector Classifier (SVC)
- Decision Tree
- Random Forest
- Gradient Boosting
- XGBoost (optional if system supports)
- Accuracy Score
- Classification Report
- Confusion Matrix
- Model: Random Forest / XGBoost (based on system performance)
- Accuracy Achieved: ~96%
- Key Insight: Study timefailuresabsencesand social factors like going out and romantic relationship status significantly impact academic performance.
- Schools/Colleges: Alert system for teachers to flag at-risk students
- EdTech Startups: Adaptive learning platforms to provide remedial content
- Policymakers: Tailored support strategies based on data-driven insights
- Clone the repository
- Install dependencies using
requirements.txt - Run the Jupyter Notebook
student_performance.ipynb - Review final model performance and visual insights
AI_Early_Warning_Student_Risk/
├── student-mat.csv
├── student_performance.ipynb
├── README.md
└── requirements.txt
- Python
- PandasNumPyMatplotlibSeaborn
- scikit-learn
- XGBoost (optional)
Suyog Manke Data Science & AI Enthusiast Passionate about solving real-world problems using ML and AI.
If you're using a Mac and facing issues with XGBoostensure OpenMP is installed:
brew install libomp
👨💻 Author Suyog Manke Powered with ❤️ by Suyog Manke
“The goal is not just prediction — it’s timely intervention that changes outcomes.”