Latest Issue
The Negative Experiences of Low-Income Citizen Commute and Their Intentions Toward Public Bus in Phnom Penh
Published: December 31,2025Reliability Study on the Placement of Electric Vehicle Charging Stations in the Distribution Network of Cambodia
Published: December 31,2025Planning For Medium Voltage Distribution Systems Considering Economic And Reliability Aspects
Published: December 31,2025Security Management of Reputation Records in the Self-Sovereign Identity Network for the Trust Enhancement
Published: December 31,2025Effect of Enzyme on Physicochemical and Sensory Characteristics of Black Soy Sauce
Published: December 31,2025Activated Carbon Derived from Cassava Peels (Manihot esculenta) for the Removal of Diclofenac
Published: December 31,2025Impact of Smoking Materials on Smoked Fish Quality and Polycyclic Aromatic Hydrocarbon Contamination
Published: December 31,2025Estimation of rainfall and flooding with remotely-sensed spectral indices in the Mekong Delta region
Published: December 31,2025Undergraduate Student Dropout Prediction with Class Balancing Techniques
-
1. Department of Applied Mathematics and Statistics, Institute of Technology of Cambodia, Russian Federation Blvd., P.O. Box 86, Phnom Penh, Cambodia
Received: July 12,2024 / Revised: August 01,2024 / / Accepted: September 11,2024 / Available online: August 30,2025
This study investigates how machine learning (ML) and deep learning (DL) techniques can be used to predict student dropouts, which is a major issue for higher education institutions. Using a dataset from Kaggle titled “Predict students’ dropout and academic success,” we analyzed data from 4424 students across 17 undergraduate programs. We used 35 different attributes for each student’s profile, which gave us a strong basis for our predictive modeling. To handle the class imbalance in the dataset, we used three methods: oversampling, undersampling, and the Synthetic Minority Oversampling Technique (SMOTE). We tested several ML and DL models, such as Logistic Regression, Decision Tree, Random Forest, K-Nearest Neighbors, Gaussian Naive Bayes, AdaBoost, XGBoost, 1D Convolutional Neural Network (CNN), Multiple Layer Perceptron (MLP), and Deep Belief Network (DBN). We evaluated these models based on accuracy, precision, recall, and F1 score. The Multiple Layer Perceptron (MLP) stood out, achieving the highest scores for accuracy 98.6%, precision 98%, recall 98%, and F1-score 98% with the oversampled dataset. This shows its great capability in managing complex data. The 1D Convolutional Neural Network (1D CNN) also performed well, particularly in recall and F1-score, with scores of 91.5% and 88.5%, respectively, with the original dataset. It maintained a strong recall of 91.4% and an F1-score of 87.7% with the undersampled dataset, and a recall of 89.2% and an F1-score of 88.1% with the SMOTE dataset, proving its effectiveness in identifying dropouts under various conditions. These results underscore the effectiveness of resampling techniques in enhancing model accuracy and the critical role of precise academic indicators in predicting student outcomes. Our study’s contribution extends to informing educational strategies with practical evidence of the efficiency of ML and DL models supported by innovative resampling methods. By recording the exceptional performance of both the MLP and 1D CNN models, the research emphasizes the transformative potential of applying advanced analytical techniques to foster student retention and academic success. The insights derived from this work could lead to actionable, data-informed interventions tailored to support students at risk of dropout, thereby improving retention rates and shaping the future landscape of educational analytics.
