全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721




Identifying the Binding Residues between Disease-Associated Proteins and Metal-Ion Ligands Based on Machine Learning Algorithm

DOI: 10.12677/HJCB.2022.123004, PP. 23-31

Keywords: 金属离子配体,5折交叉检验,位置特异性打分矩阵,随机森林算法
Metal-Ion Ligand
, 5-Fold Cross Validation, Position-Specific Scoring Matrix (PSSM), Random Forest (RF)

Full-Text   Cite this paper   Add to My Lib


Protein-ligand interactions play an important role in the pathogenesis of diseases. Many proteins perform their functions by binding to specific ligands, and the binding of protein-metal-ion ligands plays an important role in the realization of protein functions. Identifying which residues in the protein interact with metal-ion ligands can help researchers understand the molecular mechanism of protein-metal ion interaction, and it is important for human health and precision medicine. In this paper, we study the binding of disease-associated proteins to three metal ion ligands based on the machine learning algorithm. We extract three sequence features: position-specific scoring Ma-trix (PSSM), amino acid component information, dipeptide component. Then, the random forest al-gorithm and the support vector machine algorithm were used to establish the classification model of the three metal ion ligand-binding residues. Finally, the highest accuracy (Acc) was 87% for the Zn2+ binding residues in the feature fusion, the highest Accuracy (Acc) of Mg2+ binding residues was 70%, and that of Ca2+ binding residues was 70%. These results show that our model has the ability to identify the binding residues of three metal ion ligands.


[1]  张晓瑾. 基于GBM算法识别蛋白质中金属离子配体的结合残基[D]: [硕士学位论文]. 呼和浩特: 内蒙古工业大学, 2019.
[2]  Sodhi, J.S., Bryson, K., McGuffin, L.J., et al. (2004) Predicting Metal-Binding Site Residues in Low-Resolution Structural Models. Journal of Molecular Biology, 342, 307-320.
[3]  Lin, H.H., Han, L.Y., Zhang, H.L., et al. (2006) Prediction of the Functional Class of Metal-Binding Proteins from Sequence Derived Physicochemical Properties by Support Vector Ma-chine Approach. BMC Bioinformatics, 7, S13.
[4]  Jiang, Z., Hu, X.Z. Geriletu, G., et al. (2016) Identification of Ca2+-Binding Residues of a Protein from Its Primary Sequence. Genetics and Molecular Research, 15, gmr.15027618.
[5]  Cao, X.Y., Hu, X.Z., Zhang, X.J., et al. (2017) Identification of Metal Ion Binding Sites Based on Amino Acid Sequences. PLOS ONE, 12, e0183756.
[6]  Liu, L., Hu, X.Z., Feng, Z.X., et al. (2020) Recognizing Ion Ligand-Binding Residues by Random Forest Algorithm Based on Optimized Dihedral Angle. Frontiers in Bioengineer-ing and Biotechnology, 8, Article 493.
[7]  Wang, S., Hu, X.Z., Feng, Z.X., et al. (2021) Recognition of Ion Ligand Binding Sites Based on Amino Acid Features with the Fusion of Energy, Physicochemical and Structural Fea-tures. Current Pharmaceutical Design, 27, 1093-1102.
[8]  Yang, J.Y., Roy, A. and Yang, Z.Y. (2013) BioLiP: A Semi-Manually Curated Database for Biologically Relevant Ligand-Protein Interactions. Nucleic Acids Research, 41, D1096-D1103.
[9]  Bateman, A., Martin, M.-J., Orchard, S., et al. (2020) UniProt: The Universal Protein Knowledgebase in 2021. Nucleic Acids Research, 49, D480-D489.
[10]  Kou, G.S. and Feng, Y.E. (2015) Identify Five Kinds of Simple Super Secondary Structures with Quadratic Discriminant Algorithm Based on the Chemical Shifts. Journal of Theoretical Biology, 380, 392-398.
[11]  Breiman, L. (2001) Random Forests, Machine Learning 45. Journal of Clinical Microbiology, 2, 199-228.
[12]  Li, Z.C., Lai, Y.H., Chen, L.L., et al. (2012) Identification of Hu-man Protein Complexes from Local Sub-Graphs of Protein-Protein Interaction Network Based on Random Forest with Topological Structure Features. Analytica Chimica Acta, 718, 32-41.
[13]  Walsh, E.S., Kreakie, B.J., Cantwell, M.G. and Nacci, D. (2017) A Random Forest Approach to Predict the Spatial Distribution of Sediment Pollution in an Estuarine System. PLOS ONE, 12, e0179473.
[14]  Yang, L., Wu, H., Jin, X., et al. (2020) Study of Cardiovascular Disease Prediction Model Based on Random Forest in Eastern China. Scientific Reports, 10, Article No. 5245.
[15]  Sun, C.Z. and Feng, Y.E. (2021) Identify Disordered Regions of Intrinsically Disordered Proteins by Multi-Features Fusion. Current Bioinformatics, 16, 1126-1132.
[16]  Chang, C. and Lin, C.J. (2011) LIBSVM: A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2, Article 27.


comments powered by Disqus

Contact Us



WhatsApp +8615387084133