/TSRJ-ITC

As artificial intelligence has grown, a large language model is a model train with a vast quantity of textual data. This model can be tailored to a particular task, such as chatbots, text production, and question-answering. However, most of the existing pre-trained models nowadays were trained with English datasets, leading to limited support and low performance in other languages, especially low-resource languages like Khmer. To address the imbalance, we aim to make a Khmer language model by fine-tuning pre-trained state-of-the-art (SOTA) models, with a focus on question-answering tasks. We propose to use supervised fine-tuning process by providing a labeled dataset to train and utilize the quantization technique along with low-range adaptation for memory-efficient optimization. Moreover, we inject flash attention to make the training process faster. Before we start the experiment, we observed that some SOTA models were not able to recognize Khmer language. To deal with this problem, we do vocabulary expansion. To achieve our experiment, we collect datasets from online sources containing question-and-answer pairs in general knowledge domain. The three decoding strategies including greedy search, beam search, and contrastive search use to select the output tokens to generate text. We use bilingual evaluation understudy (BLEU) as an evaluation metric because it measures the similarity between generated responses and referenced sentences. Through the experiment, we obtained the BLEU score of Gemma 7B fine-tuned model increase from 0.0539 to 0.2863 on greedy search, from 0.0227 to 0.2765 on beam search, and from 0.0009 to 0.2201 on contractive search. The increasing showed that the fine-tuning process enhance the performance of model. This score also indicated that the model can generate the clear response but have grammatical error. The findings of this study contribute to the growing research on applying Khmer language with deep learning techniques to make question-answering. In conclusion, this finding will offer a multitude of benefits across various domains. Their ability to understand natural language makes them invaluable tools for businesses, educators, and researchers.

Search for Article

Journal Menu

Latest Issue

Empowering Education with Online Khmer Handwritten Text Recognition for Teaching and Learning Assistance

Undergraduate Student Dropout Prediction with Class Balancing Techniques

Status of Seawater Quality at Koh Rong Island, Sihanoukville, Cambodia

Low-Complexity Detection of Primary Synchronization Signal for 5G New Radio Terrestrial Cellular System

Word Spotting on Khmer Printed Documents

Tuning Hyperparameters Learning Rate and Gamma in Gym Environment Inverted Pendulum

Examining Passenger Loyalty in Phnom Penh Public Bus System: A Structural Equation Modelling Approach

Prediction on Load model for future load profile of Electric Vehicle charging demand in Phnom Penh

Economic Study on Integrating PV-DG with Grid-Tie: Case Study in Cambodia

Techno-Economic Comparison of VRF and Water-cooled Chiller System at the Ministry of Tourism in Sihanouk Province, Cambodia

Khmer Question-Answering Model by Fine-tuning Pre-trained Model

Journal Menu

Contact us

Hosting by