Misapplication of Statistical Methods May Lead to Misinformation
We read with great interest the cohort analysis conducted by Gronnier et al 1 addressing the impact of neoadjuvant chemoradiotherapy on anastomotic leakage after esophageal cancer resection. The authors are to be commended for their work incorporating multimodality therapy to treat a difficult disease. Anastomotic leak is one of the most feared complications of esophageal surgery and the observation that a therapy shown to have a survival benefit does not seem to influence this complication is an important one.
Studies of rare events, such as anastomotic leak, often require a large patient population to obtain a number of patients with the complication for analysis, and frequently authors look to large databases such as ACS-NSQIP, SEER, the National Inpatient Sample, and others for these data. Well-described criticisms of these studies include the bias introduced by the high level of variability of patient characteristics, given the administrative nature of these databases. This bias limits the conclusions that may be drawn with regard to the outcome of interest. Propensity score matching, as described by Rubin and Rosenbaum, 2–4 however, can remove the effects of confounding secondary to observed variables.
Propensity score matching has been increasingly used in the medical literature, with variability in the quality of its application. 5 Rigorous adherence to reporting the methods used to construct the propensity scoring model are necessary to evaluate the validity of a study. Gronnier et al do not specify their inclusion criteria for the variables within the model, nor the type of matching algorithm used. By using all variables shown to potentially influence rates of anastomotic leakage, the authors may have inadvertently introduced bias, given the relatively small sample size of their cohort. Moreover, after matching, malnutrition, a major confounder that has been noted to be associated with anastomotic leaks, remains significantly different between the groups. It would be helpful to know what specific method was used to perform the one-to-one match. For example, were nearest neighbors matched, and were any caliper restrictions required? Finally, bootstrap resampling methods are essential when using propensity score-matching methods because they are the only known methods to estimate the uncertainty in both parts of the analysis and allow construction of accurate confidence intervals and tests of statistical significance. 6 Because bootstrapping is not reported by the authors, there is concern about the accuracy of the inferences reported.
We recognize that the use of advanced statistical modeling techniques maximizes the utility of large databases to draw conclusions regarding the incidence of disease or the effectiveness of a treatment modality. Given the power of these techniques and the difficulty in constructing the models properly, readers of the medical literature must not be intimidated by their use and should carefully consider the appropriateness of the methods used to draw conclusions based upon the data. We encourage rigorous statistical review of studies making use of propensity score matching to ensure the validity of the reported results.