Motivation: RIP-chip is a high-throughput method to identify mRNAs that are targeted by RNA-binding proteins. The protein of interest is immunoprecipitated, and the identity and relative amount of mRNA associated with it is measured on microarrays. Even if a variety of methods is available to analyse microarray data, e.g. to detect differentially regulated genes, the additional experimental steps in RIP-chip require specialized methods. Here, we focus on two aspects of RIP-chip data: First, the efficiency of the immunoprecipitation step performed in the RIP-chip protocol varies in between different experiments introducing bias not existing in standard microarray experiments. This requires an additional normalization step to compare different samples and even technical replicates. Second, in contrast to standard differential gene expression experiments, the distribution of measurements is not normal. We exploit this fact to define a set of biologically relevant genes in a statistically meaningful way.
Results: Here, we propose two methods to analyse RIP-chip data: We model the measurement distribution as a gaussian mixture distribution, which allows us to compute false discovery rates (FDRs) for any cut-off. Thus, cut-offs can be chosen for any desired FDR. Furthermore, we use principal component analysis to determine the normalization factors necessary to remove immunoprecipitation bias. Both methods are evaluated on a large RIP-chip dataset measuring targets of Ago2, the major component of the microRNA guided RNA-induced silencing complex (RISC). Using published HITS-CLIP experiments performed with the same cell line as used for RIP-chip, we show that the mixture modelling approach is a necessary step to remove background, which computed FDRs are valid, and that the additional normalization is a necessary step to make experiments comparable.
Availability: An R implementation of REA is available on the project website (http://www.bio.ifi.lmu.de/REA) and as supplementary data file.
Supplementary information: Supplementary data are available at Bioinformatics online.