One reason that researchers may wish to demonstrate that an external software quality attribute can be measured consistently is so that they can validate a prediction system for the attribute. However, attempts at validating prediction systems for external subjective quality attributes have tended to rely on experts indicating that the values provided by the prediction systems informally agree with the experts' intuition about the attribute. These attempts are undertaken without a pre-defined scale on which it is known that the attribute can be measured consistently. Consequently, a valid unbiased estimate of the predictive capability of the prediction system cannot be given because the experts' measurement process is not independent of the prediction system's values. Usually, no justification is given for not checking to see if the experts can measure the attribute consistently. It seems to be assumed that: subjective measurement isn't proper measurement or subjective measurement cannot be quantified or no one knows the true values of the attributes anyway and they cannot be estimated. However, even though the classification of software systems' or software artefacts' quality attributes is subjective, it is possible to quantify experts' measurements in terms of conditional probabilities. It is then possible, using a statistical approach, to assess formally whether the experts' measurements can be considered consistent. If the measurements are consistent, it is also possible to identify estimates of the true values, which are independent of the prediction system. These values can then be used to assess the predictive capability of the prediction system. In this paper we use Bayesian inference, Markov chain Monte Carlo simulation and missing data imputation to develop statistical tests for consistent measurement of subjective ordinal scale attributes.