全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

A Webpage Classification Algorithm Concerning Webpage Design Characteristics

Keywords: Tag-region , Webpage Classification , Webpage Design , Keyword Extraction , Knowledge Management

Full-Text   Cite this paper   Add to My Lib

Abstract:

Owing to the booming growth of Internet technology, the number of web documents has significantly increased over the Internet. If the webpage can be effectively managed, the knowledge demanders (i.e., Internet users) can efficiently absorb and use the knowledge documents; it has become the core topic in this information explosion era. Webpage classification technology with high accuracy can improve the efficiency for Internet users to search required knowledge and to save lots of knowledge-searching time. Differing from previous researches, this paper explores webpage design characteristics for webpage classification. That is, concerning complexity of webpage structure, this paper analyzes the webpage design characteristics including tag attributes and tag-region layout to develop an algorithm for webpage classification. Therefore, based on webpage design characteristic analysis, the text contained in specific tag-regions can be identified. Also, the keywords extracted from each tag-region are weighted according tag attributes and tag-region locations; then, the categories of the target webpage can be determined. Furthermore, based on the hyperlink tag, the similar webpage with higher correlations can be collected to re-determine target webpage categories. In addition to the webpage classification algorithm, a web-based webpage classification system is developed to demonstrate feasibility of the proposed model. The attempt of this research is to analyze and use the characteristics of webpage design for webpage classification technology to improve the effectiveness of classification.

Full-Text

Contact Us

[email protected]

QQ:3279437679

WhatsApp +8615387084133