Latest Issue
The Negative Experiences of Low-Income Citizen Commute and Their Intentions Toward Public Bus in Phnom Penh
Published: December 31,2025Reliability Study on the Placement of Electric Vehicle Charging Stations in the Distribution Network of Cambodia
Published: December 31,2025Planning For Medium Voltage Distribution Systems Considering Economic And Reliability Aspects
Published: December 31,2025Security Management of Reputation Records in the Self-Sovereign Identity Network for the Trust Enhancement
Published: December 31,2025Effect of Enzyme on Physicochemical and Sensory Characteristics of Black Soy Sauce
Published: December 31,2025Activated Carbon Derived from Cassava Peels (Manihot esculenta) for the Removal of Diclofenac
Published: December 31,2025Impact of Smoking Materials on Smoked Fish Quality and Polycyclic Aromatic Hydrocarbon Contamination
Published: December 31,2025Estimation of rainfall and flooding with remotely-sensed spectral indices in the Mekong Delta region
Published: December 31,2025Improving Recognition Result Using Character Trigram for Khmer OCR
-
1. Department of Computer Science,
Institute of Technology of Cambodia, Russian Ferderation Blvd., P.O. Box 86, Phnom Penh, Cambodia.
Academic Editor:
Received: January 20,2024 / Revised: / Accepted: January 20,2024 / Available online: June 01,2013
The recognition phase of an Optical Character Recognition (OCR) system produces a ranked list of candidate characters, among which the top one is usally taken as recognition result without taking context into account. Recognition error occurs if the correct character is not at the top, which is mostly due to shape similarity between characters.In this paper we propose to use character trigram, which means that two previous characters are taken into account when choosing the character from the candidate list as recognition result for Khmer OCR.A text corpus of about 300 Mbytes is used to compute character trigrams. Using these trigrams, we test our approach on about 3000 characters. The result shows that this approach can correct about 30% of recognition errors.
