全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Elimination of Noisy Information from Web Pages

Keywords: Noise elimination , DOM tree , Web page cleaning.

Full-Text   Cite this paper   Add to My Lib

Abstract:

A Web page typically contains many information blocks. Besides, the content blocks, it usually has such blocks as navigation panels, copyright and privacy notices, and advertisements. These blocks that are not the main content blocks of the page, we call them as noisy blocks. We show that the information contained in these noisy blocks can seriously harm Web data mining. Thus eliminating these noises is of great importance. In our work we focus on identifying and removing local noises in web pages to improve the performance of mining. A simple idea for detection and removal of noises a new DOM tree structure is proposed. The result shows the remarkable increase in F score and accuracy is obtained.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133