OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

计算机应用研究 2013

Approach to Webpage segmentation andinformation extraction for vertical Websites
一种垂直页面分割与信息提取方法的研究

LI Jun,CHEN Jun,WANG Ling-fang,NI Hong,
李　军,陈　君,王玲芳,倪　宏

Keywords: page segmentation,information extraction,vertical Websites,content crowding level,segment tag,prefix matching
页面分割,信息获取,垂直网站,内容聚集度,分割标签,前缀匹配

Full-Text Cite this paper Add to My Lib

Abstract:

Analyzing existed Webpage segmentation algorithms along with their corresponding usage conditions, this paper investigated a vertical Webpage segmentation and information extraction method. Based on DOM tree, this paper proposed the notion of content crowding level, segmented the Webpage by using segment tag which obtained by statistical method and the mapping of cascading style sheets, and then extracted information from each segment by using text recognition and prefix matching. Given actual project requirements, a page segment and information extractor for vertical Webpage was designed and implemented. The experimental results show that the proposed method has achieved good performance and meets its needs.

Full-Text

Contact Us

[email protected]

QQ:3279437679

WhatsApp +8615387084133

Approach to Webpage segmentation andinformation extraction for vertical Websites一种垂直页面分割与信息提取方法的研究

Approach to Webpage segmentation andinformation extraction for vertical Websites
一种垂直页面分割与信息提取方法的研究