/TSRJ-ITC

1. Department of Information and Communication Engineering,, Institute of Technology of Cambodia, Russian Federation Blvd., P.O. Box 86, Phnom Penh, Cambodia

Received: August 08,2024 / Revised: September 10,2024 / / Accepted: September 17,2024 / Available online: August 30,2025

Download PDF

Browse Figures

Word spotting in Khmer printed documents presents a unique challenge due to the complexities of the Khmer script and the vast array of font styles employed. The scarcity of large, publicly available datasets further complicates this task. This work proposes a two-module approach for achieving accurate and efficient word spotting in Khmer documents. Separate datasets are utilized for text detection and recognition. The first module employs the state-of-the-art YOLOv8 model on a dataset of 10,050 text samples. The model's performance is evaluated using the F1 score, a metric that balances precision and recall in locating text. The second module leverages the fine-tuned Transformer-based TrOCR model for recognition, trained on 22,567 labeled words, with recognition accuracy measured by the Character Error Rate (CER). The first module achieves an impressive F1 score of 0.987 in locating Khmer words within documents. The second module's TrOCR model results in a CER of 8.41%. By overcoming script and font challenges through focused datasets and advanced models, this approach demonstrates potential for improving document processing and information retrieval for the Khmer language.

Search for Article

Journal Menu

Latest Issue

Environmental Variables Determining Soil Physical Properties and Carbon Content at the Catchment Scale, Stung Chrey Bak Observatory, Cambodia

The Negative Experiences of Low-Income Citizen Commute and Their Intentions Toward Public Bus in Phnom Penh

Reliability Study on the Placement of Electric Vehicle Charging Stations in the Distribution Network of Cambodia

Planning For Medium Voltage Distribution Systems Considering Economic And Reliability Aspects

Security Management of Reputation Records in the Self-Sovereign Identity Network for the Trust Enhancement

Effect of Enzyme on Physicochemical and Sensory Characteristics of Black Soy Sauce

Activated Carbon Derived from Cassava Peels (Manihot esculenta) for the Removal of Diclofenac

Land Use and Land Cover Distribution across Litho-Mineral Alteration of an Irrigated Catchment of the Tonle Sap Lake, Cambodia

Impact of Smoking Materials on Smoked Fish Quality and Polycyclic Aromatic Hydrocarbon Contamination

Estimation of rainfall and flooding with remotely-sensed spectral indices in the Mekong Delta region

Word Spotting on Khmer Printed Documents

Journal Menu

Contact us

Hosting by