Dynameomics: a multi-dimensional analysis-optimized database for dynamic protein data

    loading  Checking for direct PDF access through Ovid

Abstract

The Dynameomics project is our effort to characterize the native-state dynamics and folding/unfolding pathways of representatives of all known protein folds by way of molecular dynamics simulations, as described by Beck et al. (in Protein Eng. Des. Select., the first paper in this series). The data produced by these simulations are highly multidimensional in structure and multi-terabytes in size. Both of these features present significant challenges for storage, retrieval and analysis. For optimal data modeling and flexibility, we needed a platform that supported both multidimensional indices and hierarchical relationships between related types of data and that could be integrated within our data warehouse, as described in the accompanying paper directly preceding this one. For these reasons, we have chosen On-line Analytical Processing (OLAP), a multi-dimensional analysis optimized database, as an analytical platform for these data. OLAP is a mature technology in the financial sector, but it has not been used extensively for scientific analysis. Our project is further more unusual for its focus on the multidimensional and analytical capabilities of OLAP rather than its aggregation capacities. The dimensional data model and hierarchies are very flexible. The query language is concise for complex analysis and rapid data retrieval. OLAP shows great promise for the dynamic protein analysis for bioengineering and biomedical applications. In addition, OLAP may have similar potential for other scientific and engineering applications involving large and complex datasets.

Related Topics

    loading  Loading Related Articles