|
Arabic Content Classification System Using statistical Bayes classifier With Words Detection and CorrectionKeywords: text mining , classification , Arabic text classification , Arabic language processing. Abstract: Automatic Arabic content classification is an important text mining task especially with the rapid growth of the number of online Arabic documents. This system is an enhancement of the implemented machine learning classification algorithm by applying detection and correction algorithm of Non-Words in Arabic text. This detection and correction algorithm is built on morphological knowledge in form of consistent root pattern relationships, and some morpho-syntactical knowledge based on affixation and morph-graphic rules to specify the word recognition and non-word correction process. Many researchers had been focused on Arabic content classification from only morphological view such as word’s root and stemming techniques (prefixes and suffixes) which showed variant results. In this work, consider classification from a very different way which is the syntactical approach. This paper presents the results of experiments on document classification achieved on ten different Arabic domains (Economy, History, Family studies, Islamic, Sport, Health, Law, Stories, astronomy and Food articles) using statistical methodology. The performance of this classification system showed encouraging results compared with other existing systems.
|