OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Informing Science The International Journal of an Emerging Transdiscipline 2003

HTML Tags as Extraction Cues for Web Page Description Construction

Timothy C. Craven

Keywords: HTML, extracting, metadata, summarization, computer software, World Wide Web

Full-Text Cite this paper Add to My Lib

Abstract:

Using four previously identified samples of Web pages containing meta-tagged descriptions, the value of meta-tagged keywords, the first 200 characters of the body, and text marked with common HTML tags as extracts helpful for writing summaries was estimated by applying two measures: density of description words and density of two-word description phrases. Generally, titles and keywords showed the highest densities. Parts of the body showed densities not much different from the body as a whole: somewhat higher for the first 200 characters and for text tagged with "center" and "font"; somewhat lower for text tagged with "a"; not significantly different for "table" and "div". Evidence of non-random clumping of description words in the body of some pages nevertheless suggests that further pursuit of automatic passage extraction methods from the body may be worthwhile. Implications of the findings for aids to summarization, and specifically the TexNet32 package, are discussed.

Full-Text

Contact Us

[email protected]

QQ:3279437679

WhatsApp +8615387084133