%0 Journal Article
%T A Framework for Hierarchical Clustering Based Indexing in Search Engines
%A Parul Gupta
%A A.K. Sharma
%J BVICAM's International Journal of Information Technology
%D 2011
%I Bharati Vidyapeeth's Institute of Computer Applications and Management
%X Granting efficient and fast accesses to the index is a key issuefor performances of Web Search Engines. In order to enhancememory utilization and favor fast query resolution, WSEs useInverted File (IF) indexes that consist of an array of theposting lists where each posting list is associated with a termand contains the term as well as the identifiers of the documentscontaining the term. Since the document identifiers are stored insorted order, they can be stored as the difference between thesuccessive documents so as to reduce the size of the index. Thispaper describes a clustering algorithm that aims atpartitioning the set of documents into ordered clusters so thatthe documents within the same cluster are similar and are beingassigned the closer document identifiers. Thus the averagevalue of the differences between the successive documents willbe minimized and hence storage space would be saved. Thepaper further presents the extension of this clustering algorithmto be applied for the hierarchical clustering in which similarclusters are clubbed to form a mega cluster and similar megaclusters are then combined to form super cluster. Thus thepaper describes the different levels of clustering whichoptimizes the search process by directing the searchto a specific path from higher levels of clustering to the lowerlevels i.e. from super clusters to mega clusters, then to clustersand finally to the individual documents so that the user gets thebest possible matching results in minimum possible time.
%K Inverted files
%K Index compression
%K Document Identifiers Assignment
%K Hierarchical Clustering
%U http://www.bvicam.ac.in/bijit/Downloads/pdf/issue6/01.pdf