Massively parallel gene expression profiling has provided a more objective, molecular-level characterization of breast cancer subtypes. Several bioinformatics tools are available to infer patient subtype from a gene expression profile including the well-studied PAM50. The specific algorithmic methods used in these tools require access to a broad patient dataset. The choice of subtype for an individual is determined relative to all other patients across the panel, making subtypes heavily dependent on the composition of the dataset. Our aim was to develop a bioinformatics approach assigning absolute breast cancer subtypes, independent of dataset composition.Methods:
Using a dataset of 4924 breast cancer patients, we defined a new bioinformatics approach: Absolute Intrinsic Molecular Subtyping (AIMS) that assigns subtype from a gene expression profile for an individual sample without the need for a large, diverse, and normalized dataset. We evaluated the agreement of AIMS with PAM50 and compared subtype assignment and prognostic value of the subtypes. We assessed AIMS’ robustness using a benchmark set of tests including subtype reproducibility between technologies, gene removal, and normal gene expression contamination, and compared it with PAM50. All statistical tests, except where noted, were two-sided.Results:
AIMS vastly agreed with PAM50, with 76% and 77% agreement for cross validation and the test set, respectively, and the prognostic capacity of the intrinsic subtypes was preserved. AIMS is fully stable, and its absolute nature enables its use on a wide range of datasets and technologies, including RNA-seq.Conclusions:
The instability of a breast cancer subtyping scheme like PAM50 could have important consequences in clinical management of patients. AIMS is a fully stable and robust subtyping scheme that recapitulates PAM50.