Point of View
IRT is complex, it allows for multiple models (the one-parameter Rasch modeling model, focusing on item difficulty; the two-parameter model that considers both difficulty and discrimination; the three-parameter model, which adds a guessing parameter; and the four-parameter model designed to better control outlier issues. Model scoring can force the outcomes to be binary (0/1 or wrong or right) or polytomous, and can allow for unidimensional and multidimensional modeling.
CAT is also complex, and requires creating reliable and accurate questions relevant to the outcome of interest to store in an Item Bank. Once the bank is established item calibration is needed, and starting rules and stopping rules must be specified. Because CATs can generate a score or scores based on the particular model, the selection of each subsequent item is based on the response to the previous items.
In my view, the perceived value of IRT and CAT for clinicians is the hope that these tools will make it easier to clear out the clutter that can pile up when attempting to provide patient care. The comparison of various health status measures could be used to reduce the number of surveys needed to define the patient's status at initial visit and then evaluate the treatment success at follow-up based on patient-reported outcome.
Unfortunately, an attempt to reduce the field of outcome does not apply because of the patient sampling, or more precisely, the lack of sampling defined to establish a homogenous patient cohort based on the patients’ condition(s). The relatively large sample of patients in the present study is too diverse to be a cohort. Comparison of subsets of the sample with similar symptoms and signs might be useful.
Patient-reported outcomes are something of a misnomer. At an initial visit the health-related surveys are not about outcomes, but more about their current health state and, if the system was cleaver, to evaluate the pattern of responses not so much for determining their overall score, but rather to inspect the outlier responses as part of the clinical assessment that can be part of the basis for determining a precise diagnosis.
Once the diagnosis is established this leads to determining the treatment that is most likely to maximize the likelihood of obtaining a good outcome. But what makes a good outcome? It is hard to establish improvement if you do not have pretreatment information about their health state. Thus, choice of the best outcome to evaluate treatment success may be, in part, about (1) the effects that the treatment will have on the symptoms and signs evaluated at baseline, (2) the baseline assessments that provide some of the basis for determining the diagnosis and the requisite treatment, and (3) the information gathered at baseline that helped to clarify the dx by noticing patterns in the responses that do not fit the mold.
Suppose you have a 10-item activities of daily living scale where the first question is the easiest (e.g. can you breath without needing supplemental oxygen) and question 10 is the hardest (e.g., can you run a mile in less than 6 minutes?). Now, suppose that patient X arrives for an initial visit associated with spine-related symptoms.