全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Design and Development of a Script Recognition Tool for Indian Document Images

Full-Text   Cite this paper   Add to My Lib

Abstract:

Identification of scripts from multi-script document is one of the important steps in the design of an OCR system for successful analysis and recognition. Most optical character recognition (OCR) systems can recognize at most a few scripts. But for large archives of document images that contain different scripts, there must be some way to automatically categorize these documents before applying the proper OCR on them. Much work has already been reported in this area. In the Indian context, though some results have been reported, the task is still at its infancy. This paper presents a research in the identification of Tamil, English, Hindi, Malayalam, Kannada and Telugu scripts at word level irrespective of their font faces and sizes. The proposed technique performs the nine zones segmented over the characters based on their shape, density and transition features. Then script is determined by using Rule based classifiers containing set of classification rules which are raised from the zones. Results from experiments, simulations, and human vision encounter that the proposed technique identifies scripts with minimal pre-processing and high accuracy. It can also be extended for other scripts. Since this system can act as a plug-in, this can be embedded with OCR prior to the recognition stage.

Full-Text

Contact Us

[email protected]

QQ:3279437679

WhatsApp +8615387084133