OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

International Journal of Internet Science 2008

Objectivity, Reliability, and Validity of Search Engine Count Estimates

Dietmar Janetzko

Keywords: Data quality , goodness criteria , Web mining , search engines , search engine counts

Full-Text Cite this paper Add to My Lib

Abstract:

Count estimates ("hits") provided by Web search engines have received much attention as a yardstick to measure a variety of phenomena of interest as diverse as, e.g., language statistics, popularity of authors, or similarity between words. Common to these activities is the intention to use Web search engines not only for search but for ad hoc measurement. Using search engine count estimates (SECEs) in this way means that a phenomenon of interest, e.g., the popularity of an author, is conceived of as a measurand, and SECEs are taken to be its quantitative measures. However, the data quality of SECEs has not yet been studied systematically, and concerns have been raised against the use of this kind of data. This article examines the data quality of SECEs focusing on classical goodness criteria, i.e., objectivity, reliability, and validity. The results of a series of studies indicate that with the exception of Boolean queries that use disjunction or negation objectivity as well as test-retest reliability and parallel-test reliability of SECEs is good for most types of browsers and search engines examined. Estimation of validity required model development (all-subsets regression) revealing satisfying results by using an explorative approach to feature selection. The ndings are discussed in the light of previous objections and perspectives for using Web search count estimates are delineated.

Full-Text

Contact Us

[email protected]

QQ:3279437679

WhatsApp +8615387084133