Most biological sequences contain compositionally biased segments in which one or more residue types are significantly overrepresented. The function and evolution of these segments are poorly understood. Usually, all types of compositionally biased segments are masked and ignored during sequence analysis. However, it has been shown for a number of proteins that biased segments that contain amino acids with similar chemical properties are involved in a variety of molecular functions and human diseases. A detailed large-scale analysis of the functional implications and evolutionary conservation of different compositionally biased segments requires a sensitive method capable of detecting user-specified types of compositional bias.Results
We present BIAS, a novel sensitive method for the detection of compositionally biased segments composed of a user-specified set of residue types. BIAS uses the discrete scan statistics that provides a highly accurate correction for multiple tests to compute analytical estimates of the significance of each compositionally biased segment. The method can take into account global compositional bias when computing analytical estimates of the significance of local clusters. BIAS is benchmarked against SEG, SAPS and CAST programs. We also use BIAS to show that groups of proteins with the same biological function are significantly associated with particular types of compositionally biased segments.