Latest Issue
Empowering Education with Online Khmer Handwritten Text Recognition for Teaching and Learning Assistance
Published: August 30,2025Undergraduate Student Dropout Prediction with Class Balancing Techniques
Published: August 30,2025Status of Seawater Quality at Koh Rong Island, Sihanoukville, Cambodia
Published: August 30,2025Low-Complexity Detection of Primary Synchronization Signal for 5G New Radio Terrestrial Cellular System
Published: August 30,2025Word Spotting on Khmer Printed Documents
Published: August 30,2025Tuning Hyperparameters Learning Rate and Gamma in Gym Environment Inverted Pendulum
Published: August 30,2025Examining Passenger Loyalty in Phnom Penh Public Bus System: A Structural Equation Modelling Approach
Published: August 30,2025Prediction on Load model for future load profile of Electric Vehicle charging demand in Phnom Penh
Published: August 30,2025Economic Study on Integrating PV-DG with Grid-Tie: Case Study in Cambodia
Published: August 30,2025Undergraduate Student Dropout Prediction with Class Balancing Techniques
-
1. Department of Applied Mathematics and Statistics, Institute of Technology of Cambodia, Russian Federation Blvd., P.O. Box 86, Phnom Penh, Cambodia
Received: July 12,2024 / Revised: August 01,2024 / / Accepted: September 11,2024 / Available online: August 30,2025
This study investigates how machine learning (ML) and deep learning (DL) techniques can be used to predict student dropouts, which is a major issue for higher education institutions. Using a dataset from Kaggle titled “Predict students’ dropout and academic success,” we analyzed data from 4424 students across 17 undergraduate programs. We used 35 different attributes for each student’s profile, which gave us a strong basis for our predictive modeling. To handle the class imbalance in the dataset, we used three methods: oversampling, undersampling, and the Synthetic Minority Oversampling Technique (SMOTE). We tested several ML and DL models, such as Logistic Regression, Decision Tree, Random Forest, K-Nearest Neighbors, Gaussian Naive Bayes, AdaBoost, XGBoost, 1D Convolutional Neural Network (CNN), Multiple Layer Perceptron (MLP), and Deep Belief Network (DBN). We evaluated these models based on accuracy, precision, recall, and F1 score. The Multiple Layer Perceptron (MLP) stood out, achieving the highest scores for accuracy 98.6%, precision 98%, recall 98%, and F1-score 98% with the oversampled dataset. This shows its great capability in managing complex data. The 1D Convolutional Neural Network (1D CNN) also performed well, particularly in recall and F1-score, with scores of 91.5% and 88.5%, respectively, with the original dataset. It maintained a strong recall of 91.4% and an F1-score of 87.7% with the undersampled dataset, and a recall of 89.2% and an F1-score of 88.1% with the SMOTE dataset, proving its effectiveness in identifying dropouts under various conditions. These results underscore the effectiveness of resampling techniques in enhancing model accuracy and the critical role of precise academic indicators in predicting student outcomes. Our study’s contribution extends to informing educational strategies with practical evidence of the efficiency of ML and DL models supported by innovative resampling methods. By recording the exceptional performance of both the MLP and 1D CNN models, the research emphasizes the transformative potential of applying advanced analytical techniques to foster student retention and academic success. The insights derived from this work could lead to actionable, data-informed interventions tailored to support students at risk of dropout, thereby improving retention rates and shaping the future landscape of educational analytics.