全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Approach to Webpage segmentation andinformation extraction for vertical Websites
一种垂直页面分割与信息提取方法的研究

Keywords: page segmentation,information extraction,vertical Websites,content crowding level,segment tag,prefix matching
页面分割
,信息获取,垂直网站,内容聚集度,分割标签,前缀匹配

Full-Text   Cite this paper   Add to My Lib

Abstract:

Analyzing existed Webpage segmentation algorithms along with their corresponding usage conditions, this paper investigated a vertical Webpage segmentation and information extraction method. Based on DOM tree, this paper proposed the notion of content crowding level, segmented the Webpage by using segment tag which obtained by statistical method and the mapping of cascading style sheets, and then extracted information from each segment by using text recognition and prefix matching. Given actual project requirements, a page segment and information extractor for vertical Webpage was designed and implemented. The experimental results show that the proposed method has achieved good performance and meets its needs.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133