%0 Journal Article %T Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring %A Kirk K Durston %A David KY Chiu %A Andrew KC Wong %A Gary CL Li %J EURASIP Journal on Bioinformatics and Systems Biology %D 2012 %I BioMed Central %R 10.1186/1687-4153-2012-8 %X The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function.Our results demonstrate that the method we present here using a k-modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family.The determination of protein 3D structure using methods such as NMR and X-ray crystallography has made tremendous progress. Although the 3D structure of many proteins has been solved, there still remains the problem of understanding the internal relationships within the structure. Certain residues may require specific associations with other residues within the structure that are not necessarily spatially proximal. Certain pairwise, third-order, fourth-order, and higher-order associations may be essential for obtaining a stable structure, while other parts of the structure have a less important role. The challenge is to be able to identify key structural associations within the larger structure, with the objective of understanding what role t %K k-modes algorithm %K Site cluster %K Associations %K Ubiquitin %K Transthyretin %K Pattern discovery %K Cluster tree %K Attribute clustering %K Protein structural sub-domains %U http://bsb.eurasipjournals.com/content/2012/1/8