全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

A Framework for Hierarchical Clustering Based Indexing in Search Engines

Keywords: Inverted files , Index compression , Document Identifiers Assignment , Hierarchical Clustering

Full-Text   Cite this paper   Add to My Lib

Abstract:

Granting efficient and fast accesses to the index is a key issuefor performances of Web Search Engines. In order to enhancememory utilization and favor fast query resolution, WSEs useInverted File (IF) indexes that consist of an array of theposting lists where each posting list is associated with a termand contains the term as well as the identifiers of the documentscontaining the term. Since the document identifiers are stored insorted order, they can be stored as the difference between thesuccessive documents so as to reduce the size of the index. Thispaper describes a clustering algorithm that aims atpartitioning the set of documents into ordered clusters so thatthe documents within the same cluster are similar and are beingassigned the closer document identifiers. Thus the averagevalue of the differences between the successive documents willbe minimized and hence storage space would be saved. Thepaper further presents the extension of this clustering algorithmto be applied for the hierarchical clustering in which similarclusters are clubbed to form a mega cluster and similar megaclusters are then combined to form super cluster. Thus thepaper describes the different levels of clustering whichoptimizes the search process by directing the searchto a specific path from higher levels of clustering to the lowerlevels i.e. from super clusters to mega clusters, then to clustersand finally to the individual documents so that the user gets thebest possible matching results in minimum possible time.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413