Statistics Commentary Series: Commentary No. 23
The answer comes down to the issue of how much accuracy can be supported by the data, and this in turn depends on 2 factors: the way the data were gathered in the first place, and the number of observations contributing to the estimate. Let’s begin with the correlations, which are reported to 4 decimal places. The publication manual of the American Psychological Association, which is the “bible” for many journals in the social sciences, states that “As a general rule, fewer decimal digits are easier to comprehend than more digits; therefore, in general, it is better to round to two decimal places,”1, p. 113 a sentiment echoed by other guidelines.2 Notice, though, that the argument is phrased in terms of comprehension, not precision. I would argue that even reporting correlations to 2 decimal places is an example of “pseudoprecision”3 that cannot be justified in the vast majority of studies. We did a Monte Carlo simulation, using correlations of 0.15, 0.30, 0.50, and 0.70 and for each, generated random samples of 60, 100, 200, 500, 1,000, 10,000, and 100,000 values. What we were interested in was the reproducibility of the first, second, third, and fourth decimal place. We concluded that “even when n is less than 500, the habit of reporting a result to two decimal places seems unwarranted, and it never makes sense to report the third digit after the decimal place unless one has a sample size larger than 100,000.”4,, p. 687 This should not be surprising. As Feinberg and Wainer5 pointed out in their delightful paper, in order for the third decimal place to be reproducible, the standard error (SE) needs to be less than 0.0005, and since the SE is 1/√n, then √n = 1/0.0005, or 2,000 and therefore n would have to be 4,000,000. A sample size greater than 400 is necessary for even the first decimal place to be reproducible. That is, for the vast majority of studies in psychopharmacology, where sample sizes over 100 are rare, even the second digit of a correlation is basically an irreproducible value, and the third and fourth digits do not represent increased precision, but rather more sampling error.