|
(Errors in statistical tests)3Abstract: The previous discussion emphasized statistical significance testing. But there are various reasons to expect departure from the uniform distribution in terminal digits of p-values, so that simply rejecting the null hypothesis is not terribly informative. Much more importantly, Jeng found that the original p-value of 0.043 should have been 0.086, and suggested this represented an important difference because it was on the other side of 0.05. Among the most widely reiterated (though often ignored) tenets of modern quantitative research methods is that we should not treat statistical significance as a bright line test of whether we have observed a phenomenon. Moreover, it sends the wrong message about the role of statistics to suggest that a result should be dismissed because of limited statistical precision when it is so easy to gather more data.In response to these limitations, we gathered more data to improve the statistical precision, and analyzed the actual pattern of the departure from uniformity, not just its test statistics. We found variation in digit frequencies in the additional data and describe the distinctive pattern of these results. Furthermore, we found that the combined data diverge unambiguously from a uniform distribution. The explanation for this divergence seems unlikely to be that suggested by the previous authors: errors in calculations and transcription.In 2004, Garcia-Berthou and Alcaraz [GBA] published "Incongruence between test statistics and P values in medical papers [1]." This article reported that last digits of published test statistics and p-values in a sample of consecutive articles from Nature deviated from a uniform distribution more than would be expected by chance. The article, which also examined incongruence between reported statistics and p-values, attracted a great deal of attention among journal editors, the popular press, and a large number of readers [2]. In 2006, however, Jeng pointed out that the GBA analysis of last digi
|