Comparison of Binary Predictive Scoring Systems of Posthepatectomy Liver Failure: A Response
We thank Harrison et al for their relevant comments on our article1 that undertook a comparison of posthepatectomy liver failure (PHLF) definitions (International Study Group of Liver Surgery [ISGLS] definition2 vs 50--50 criteria3 and PeakBili >74) and their relevance in predicting clinical outcomes. On the basis of the low restrictive pattern of the ISGLS definition of PHLF (defined as an increased International Normalized Ratio and concomitant hyperbilirubinemia---according to the normal limits of the local laboratory---on or after postoperative day 5), we initially assumed that this definition would be fulfilled by too many patients to be relevant in clinical practice. Indeed within a study population of 680 patients undergoing hepatic resection, 79 patients fulfilled ISGLS definition compared with 24 for 50--50 and 44 for PeakBili greater than 7 criteria. As a consequence, ISGLS definition showed high sensitivity---namely, detecting most patients with “PHLF”---though at the cost of a low value in identifying patients with clinically significant PHLF (ie, at risk of severe complications or death). As stated by Harrison et al, several procedures can be used to compare the predictive accuracy of diagnostic tests. For that purpose, the sensitivity and the specificity are usually reported and compared by the mean of an adequate statistical test. When the data are paired, one must use a test that takes into account the dependence between the data, and the McNemar χ2 test is currently used for this purpose.5 Harrison et al suggested taking into account modifications of this approach with reference to Nofuentes and del Castillo.6 But in the cited article,6 this point is clearly emphasized: “In paired designs, the comparison of the sensitivity and specificity of the two diagnostic tests is carried out through the classic McNemar test.” This is the reason why we used this latter test. Thus, the ISGLS definition showed the highest sensitivity among the 3 scores whatever the postoperative time point (on POD5 and within 10 days) but at the price of a greatly lower positive predictive value (PPV) in comparison with the 2 other scores. Besides sensitivity/specificity and odds ratios that are statistical parameters useful in series, positive or negative predictive values are the most useful in clinical practice in predicting the risk of death or major complications (or not) for a particular patient. They have the particularity to depend on the prevalence of the event of interest. Our different results and especially the morbidity and mortality rates of the studied population are quite consistent with those reported in the literature so that we were allowed to use PPV and Negative Predictive Value. In patients with a positive score, the ISGLS definition was the least relevant to predict major complications and mortality (PPV of 49.4% and 21.8% vs 79.2% and 47.8% for 50--50 and 61.4% and 40.5% for PeakBili >7 criteria). Moreover, the relevance of ISGLS definition was not better in subgroups of patients at higher risk of PHLF (major hepatectomy, portal vein occlusion, high blood loss, vascular clamping, or cirrhosis). The multifactorial and complex nature of PHLF could partly explain the low relevance of the purely biological definitions of PHLF that reflect only the consequences of PHLF. Otherwise, Harrison et al compared our reported odds ratios and confidence intervals between the 3 scoring systems for predicting morbidity or mortality by using a “Z test” with a Bonferroni correction and they failed to find statistical difference. We studied the diagnostic odds ratio because the use of a measure that summarizes the diagnostic accuracy as a single value is recommended in terms of statistical method.