Many heuristic algorithms have been designed to approximate P-values of DNA motifs described by position weight matrices, for evaluating their statistical significance. They often significantly deviate from the true P-value by orders of magnitude. Exact P-value computation is needed for ranking the motifs. Furthermore, surprisingly, the complexity of the problem is unknown.Results
We show the problem to be NP-hard, and present MotifRank, software based on dynamic programming, to calculate exact P-values of motifs. We define the exact P-value on a general and more precise model. Asymptotically, MotifRank is faster than the best exact P-value computing algorithm, and is in fact practical. Our experiments clearly demonstrate that MotifRank significantly improves the accuracy of existing approximation algorithms.