Limits of Agreement With Confidence Intervals Are Necessary to Assess Comparability of Measurement Devices

    loading  Checking for direct PDF access through Ovid


Bergese et al1 present a validation study of photoplethysmography (PPG) for measuring respiratory rate, an important step in the introduction of a monitoring device into clinical use. The authors adapted a well-known method of analysis described by Bland and Altman.2 Bland and Altman emphasized that when comparing measurements made with 2 devices, the best measure would be the limits of agreement, and these limits should be qualified by confidence intervals. This is particularly important when the data structure is complex. For example, a recent study by Parker et al3 of different devices to measure respiratory rate showed not only substantial differences in the limits of agreement, but also in the confidence intervals for these limits. Olofsen et al4 describe how features of the data structure in Bland and Altman analysis exert these effects and provide freely available software to calculate confidence intervals (
Unfortunately, reporting of Bland and Altman’s analysis is frequently incomplete in anesthesia journals, and, in particular, the precision of limits of agreement is rarely provided in reports of comparison studies.5
The graphical data presented by Bergese et al1 are the most clinically relevant, but are difficult to interpret for several reasons. First, they include both healthy subjects and patients, and the data from these different groups cannot be distinguished. Second, the plot includes an extremely large number of (probably largely redundant) data pairs, many of which overlap. Consequently, it is possible that the apparent outliers (see below for description) could be only a tiny proportion of the entire sample. Third, proper interpretation of the results requires showing the limits of agreement and their 95% confidence intervals, preferably for the volunteers and patients separately.
Examination of the modified Bland-Altman plot shown in Figure 2 of Bergese et al1 shows a clear trend in the bias (difference) between the measurements. When the respiratory rate measured by capnography is <15 breaths per minute (bpm), the PPG reads higher. A further intriguing feature is a group of outlying results clustered between capnograph rates of 13 and 17 bpm. Some of the PPG rates here exceed 25 bpm. Finally, a smaller outlier group is seen where capnography gives rates of 22–26, and here the plethysmograph reads low. The authors fail to discuss any of these interesting anomalies.
Taken as a whole, these devices may appear to be equivalent, but I suggest that, in practice, clinicians could be faced with patients in whom the respiratory rates obtained by these methods may disagree substantially. The limits of agreement should answer the question “are these measurement systems equivalent, can I use them interchangeably?” At present, in an individual patient, we have insufficient information to tell if this is so.
    loading  Loading Related Articles