The Road Between Statistics and Clinical Decision Making Is Longer Than We Think*
When using similar methodology and including two additional trials, Tasker et al (1) in this issue of Pediatric Critical Care Medicine essentially reached a similar conclusion, highlighted limitations of combining trials, and added that length of hypothermia does not appear to influence outcome. They concluded that this information lacks guidance for clinicians deciding whether to use hypothermia. The authors then propose that Bayesian meta-analysis provides a better framework for understanding such uncertainty.
There are essentially three distributional approaches to meta-analysis. The fixed effects approach assumes that the underlying quantity of interest is fixed by nature, and each study included draws from a sampling distribution around this fixed point, assuming that the square root of the variance of found quantities in the studies is the SE of a mean estimate of the true fixed value. The random effects approach (2, 3) assumes that there is sufficient heterogeneity and uncertainty for the quantity of interest that it defines a distribution rather than a fixed point. The Bayesian approach considers parameterizations of these distributions of random effects rather than just assuming a basic (usually normal) form.
The fixed effects approach is rarely justified because included models almost always have different mixes of explanatory variables. Minor differences in regression-style specifications can yield immense differences in inference and prediction; therefore, it is a very strict assumption to say that the studies produce independent and identically distributed (IID) draws from a fixed sampling distribution.
The random effects approach is more principled since it admits a level of heterogeneity between the included studies such that the quantity of interest needs to be modeled as a distribution. That is, there may be a modal region for strong effects, but differing specifications and sampling strategies cause deviations from this point of centrality. The Bayesian approach improves upon the random effects approach because it attempts to “model” the underlying cause for study heterogeneity by explicitly stating distributions. Therefore, it is the only reasonable alternative for comparing biomedical studies.
We commend the authors for taking the Bayesian approach, but some terms deserve clarification. For example, frequentism should not be confused with likelihoodism. Developed by Neyman and Pearson (4–7), the frequentist approach assumes an unending stream of IID data, sets the alpha level in advance (therefore does not admit p values), and mechanically accepts one complementary hypothesis over the other. There is no null hypothesis. In contrast, Fisher (8–10) posited a hypothesis that was to be nullified by the weight of evidence for the alternative hypothesis, where this weight is measured by the density in the tail of the null distribution, the p value. Not only are these approaches very different, these scholars vehemently rejected each other’s view. Unfortunately, textbook writers in the middle of the 20th century blended the vocabulary, leading to confusion in biomedical applications. This confusion includes the incorrect belief that categories of decreasing p values are increasing probability that the null hypothesis is false. The development and use of likelihood functions by Fisher (8–10) (literally the joint probability of observing the dataset at hand) is one of the most important developments in statistics. It turns out that such analysis is a special case of the Bayesian approach in which the appropriately bounded uniform (flat) prior distributions are specified.