### Excerpt

The P‐value, although recently often criticized (see ASA statement on P‐values2 and the plethora of reactions), allows the researcher to calibrate their raw findings to the variability inherent in all data and to the amount of information collected and then attaches it to probability as a means to make judgments. Before the P‐value was widely used, scientists were often misled when they drew conclusions if too little data were collected. Further, the P‐value along with the concept of power could be used so that a scientist could reasonably understand how much data should be collected in order to be able to have a reasonable chance that the data clearly shows an effect when this effect truly exists. It is clear that without this ability much of the scientific progress since 1900 could not have taken place, or at least would have been greatly slowed. The P‐value has rightly been criticized, but much of the criticism is not due to the tool, instead it is due to how the tool has been used. The near universal ritualistic agreement that a P‐value < 0.05 was always the cutoff of when an effect was real and when it was not, without any consideration of the conditions behind its generation, is one such problem. Even if the conditions were well understood and seemed favorable for the generation of 1 out of 20 still might not be the right amount of evidence for the situation (it could be more or less). However, for scientists not to make these mistakes the P‐value must be understood. The difficulty is, of course, that many of us who use the P‐value as the means of drawing conclusions do not understand it well enough to understand its pitfalls. Some have suggested that this problem would be solved if the adoption of Bayesian inference were more widespread. Although it is true that in some ways Bayesian inference is more intuitive, it still involves the use of probabilistic reasoning that could suffer the same pitfalls as the frequentist alternative. The argument, then, is that without a probabilistic basis for drawing conclusions we are unable to determine the appropriate amount of data to collect. Thus, without the P‐value or some sort of a Bayesian posterior probability, science would have a difficult time progressing.

Randomization of experiments and later clinical trials was the other huge advance that has been highlighted. The issue here is more basic. If one intervenes with a treatment for a patient and improvement is seen, how does one know whether the treatment was in fact the thing that caused the improvement? It could be due to some other factor (this factor, then, is said to be confounded with the treatment) or possibly it was due to good luck.