%0 Journal Article %T Approach to Webpage segmentation andinformation extraction for vertical Websites
一种垂直页面分割与信息提取方法的研究 %A LI Jun %A CHEN Jun %A WANG Ling-fang %A NI Hong %A
李 军 %A 陈 君 %A 王玲芳 %A 倪 宏 %J 计算机应用研究 %D 2013 %I %X Analyzing existed Webpage segmentation algorithms along with their corresponding usage conditions, this paper investigated a vertical Webpage segmentation and information extraction method. Based on DOM tree, this paper proposed the notion of content crowding level, segmented the Webpage by using segment tag which obtained by statistical method and the mapping of cascading style sheets, and then extracted information from each segment by using text recognition and prefix matching. Given actual project requirements, a page segment and information extractor for vertical Webpage was designed and implemented. The experimental results show that the proposed method has achieved good performance and meets its needs. %K page segmentation %K information extraction %K vertical Websites %K content crowding level %K segment tag %K prefix matching
页面分割 %K 信息获取 %K 垂直网站 %K 内容聚集度 %K 分割标签 %K 前缀匹配 %U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=A9D9BE08CDC44144BE8B5685705D3AED&aid=57319E4289F4438F9FA1AAFAE1C1B105&yid=FF7AA908D58E97FA&vid=340AC2BF8E7AB4FD&iid=38B194292C032A66&sid=693E1FFD7BD946DF&eid=BCCCE1B88B87184D&journal_id=1001-3695&journal_name=计算机应用研究&referenced_num=0&reference_num=12