Latest Issue
Physico-chemical Characteristics of Rice-based Cereal Processed by Twin-screw Extrusion and Microwave Cooking
Published: December 31,2024Investigation of the Influence of Extrusion Conditions on Cambodian Extruded Rice Vermicelli
Published: December 31,2024Application of High-Pressure and High-Temperature Reactor for Extraction of Essential Oil from Kaffir Lime Peel
Published: December 31,2024Minimum Standard of Traffic Safety Devices at Primary School Zone Black Spot in Phnom Penh
Published: December 31,2024Effect of Different Water-Saving Irrigation Methods for Rice Cultivation, Case Study in Cambodia.
Published: December 31,2024Should water taxi service in Phnom Penh be abandoned or sustained?
Published: December 31,2024Improving Recognition Result Using Character Trigram for Khmer OCR
-
1. Department of Computer Science,
Institute of Technology of Cambodia, Russian Ferderation Blvd., P.O. Box 86, Phnom Penh, Cambodia.
Academic Editor:
Received: January 20,2024 / Revised: Accepted: January 20,2024 / Published: June 01,2013
The recognition phase of an Optical Character Recognition (OCR) system produces a ranked list of candidate characters, among which the top one is usally taken as recognition result without taking context into account. Recognition error occurs if the correct character is not at the top, which is mostly due to shape similarity between characters.In this paper we propose to use character trigram, which means that two previous characters are taken into account when choosing the character from the candidate list as recognition result for Khmer OCR.A text corpus of about 300 Mbytes is used to compute character trigrams. Using these trigrams, we test our approach on about 3000 characters. The result shows that this approach can correct about 30% of recognition errors.