|
SRF-LDA:基于堆叠集成学习的LncRNA与疾病关联预测方法
|
Abstract:
长链非编码RNA (lncRNA)是一类长度大于200 nt的非编码RNA,是非编码基因组的重要组成部分。大量实验证实,lncRNA与人类疾病的发生发展密不可分,但除了一小部分的lncRNA与人类疾病关系已知之外,大多数的lncRNA与人类疾病的关系仍然有待研究,因此准确识别与疾病有关的lncRNA有助于研究lncRNA在疾病中的作用机制,探索治疗疾病的新方法。在本研究中,为了提高对LDA的预测能力,我们实现了一种基于堆叠集成学习的LDA预测模型(简称SRFLDA)。在SRFLAD中,第一部分通过整合lncRNA的K-mer、疾病的高斯相互作用谱核相似性及已知lncRNA-疾病关联(LDA)三种类型的特征作为融合特征输入模型。第二部分使用堆叠集成学习策略通过组合多个不同参数的随机森林分类器作为基模型进行特征分类,并使用支持向量机作为元模型对随机森林的分类结果进行组合优化,从而得到更准确、鲁棒的LDA预测结果。第三部分通过十倍交叉验证对模型进行训练评价。结果表明该方法在预测LDA方面具有较好的性能,平均AUC的值为0.9246,平均AUPR值为0.9166,预测效果优于其他几种现有的LDA预测模型。
Long non-coding RNAs (lncRNAs) are a class of non-coding RNAs larger than 200 nt in length and are an important component of the non-coding genome. A large number of experiments have confirmed that lncRNA is inseparable from the occurrence and development of human diseases, but except for a small number of lncRNAs with human diseases, the relationship between most lncRNAs and human diseases still needs to be studied, so accurate identification of lncRNAs related to diseases is helpful to study the mechanism of action of lncRNAs in diseases and explore new ways to treat diseases. In this study, in order to improve the prediction ability of LDA, we implemented an LDA prediction model based on stacked ensemble learning (SRFLDA). In SRFLAD, the first part is used to integrate three types of features of lncRNA, namely K-mer, Gaussian interaction spectral nuclear similarity of disease, and known lncRNA-disease association (LDA), as fusion features as input into the model. In the second part, the stacked ensemble learning strategy is used to classify features by combining random forest classifiers with multiple different parameters as the base model, and the support vector machine is used as a metamodel to combine and optimize the classification results of the random forest, so as to obtain more accurate and robust LDA prediction results. The third part is to evaluate the training of the model through tenfold cross-validation. The results show that the proposed method has good performance in predicting LDA, with an average AUC value of 0.9246 and an average AUPR value of 0.9166, which is better than that of several other existing LDA prediction models.
[1] | Yang, G.D., Lu, X.Z. and Yuan, L.J. (2014) LncRNA: A Link between RNA and Cancer. Biochimica et Biophysica Acta (BBA)-Gene Regulatory Mechanisms, 1839, 1097-1109. https://doi.org/10.1016/j.bbagrm.2014.08.012 |
[2] | Wapinski, O. and Chang, H.Y. (2011) Long Noncoding RNAs and Human Disease. Trends in Cell Biology, 21, 354-361. https://doi.org/10.1016/j.tcb.2011.04.001 |
[3] | Panwar, B., Arora, B. and Raghava, G.P. (2014) Prediction and Classification of ncRNAs Using Structural Information. BMC Genomics, 15, Article No. 127. https://doi.org/10.1186/1471-2164-15-127 |
[4] | Lu, Q., Ren, S., Lu, M., Zhang, Y., Zhu, D., Zhang, X. and Li, T. (2013) Computational Prediction of Associations between Long Non-Coding RNAs and Proteins. BMC Genomics, 14, Article No. 651. https://doi.org/10.1186/1471-2164-14-651 |
[5] | Saldana-Meyer, R., et al. (2019) RNA Interactions Are Essential for CTCF-Mediated Genome Organization. Molecular Cell, 76, 412-422e415. https://doi.org/10.1016/j.molcel.2019.08.015 |
[6] | Chen, L.L. and Carmichael, G.G. (2009) Altered Nuclear Retention of mRNAs Containing Inverted Repeats in Human Embryonic Stem Cells: Functional Role of a Nuclear Noncoding RNA. Molecular Cell, 35, 467-478. https://doi.org/10.1016/j.molcel.2009.06.027 |
[7] | Clemson, C.M., et al. (2009) An Architectural Role for a Nuclear Noncoding RNA: NEAT1 RNA Is Essential for the Structure of Paraspeckles. Molecular Cell, 33, 717-726. https://doi.org/10.1016/j.molcel.2009.01.026 |
[8] | Sasaki, Y.T., Ideue, T., Sano, M., Mituyama, T. and Hirose, T. (2009) MENepsilon/Beta Noncoding RNAs Are Essential for Structural Integrity of Nuclear Paraspeckles. Proceedings of the National Academy of Sciences of the United States of America, 106, 2525-2530. https://doi.org/10.1073/pnas.0807899106 |
[9] | Salmena, L., Poliseno, L., Tay, Y., Kats, L. and Pandolfi, P.P. (2011) A ceRNA Hypothesis: The Rosetta Stone of a Hidden RNA Language? Cell, 146, 353-358. https://doi.org/10.1016/j.cell.2011.07.014 |
[10] | Zhang, X., Wang, W., Zhu, W., Dong, J., Cheng, Y., Yin, Z. and Shen, F. (2019) Mechanisms and Functions of Long Non-Coding RNAs at Multiple Regulatory Levels. International Journal of Molecular Sciences, 20, Article No. 5573. https://doi.org/10.3390/ijms20225573 |
[11] | Chen, X. and Yan, G.Y. (2013) Novel Human lncRNA-Disease Association Inference Based on lncRNA Expression Profiles. Bioinformatics, 29, 2617-2624. https://doi.org/10.1093/bioinformatics/btt426 |
[12] | Zhou, M., Wang, X., Li, J., et al. (2013) Prioritizing Candidate Disease-Related Long Non-Coding RNAs by Walking on the Heterogeneous lncRNA and Disease Network. Molecular BioSystems, 11, 760-769. https://doi.org/10.1039/C4MB00511B |
[13] | Xuan, P., Pan, S., Zhang, T., Liu, Y. and Sun, H. (2019) Graph Convolutional Network and Convolutional Neural Network Based Method for Predicting lncRNA-Disease Associations. Cells, 8, Article No. 1012. https://doi.org/10.3390/cells8091012 |
[14] | Xuan, P., Cao, Y., Zhang, T., Kong, R. and Zhang, Z. (2019) Dual Convolutional Neural Networks with Attention Mechanisms Based Method for Predicting Disease-Related lncRNA Genes. Frontiers in Genetics, 10, Article No. 416. https://doi.org/10.3389/fgene.2019.00416 |
[15] | Zeng, M., Lu, C., Fei, Z., Wu, E., Li, Y., Wang, J. and Li, M. (2020) Dm-flda: A Deep Learning Framework for Predicting incRNA-Disease Associations. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 18, 2353-2363. https://doi.org/10.1109/TCBB.2020.2983958 |
[16] | Zhang, Y., Ye, F. and Gao, X. (2021) MCA-Net: Multi-Feature Coding and Attention Convolutional Neural Network for Predicting lncRNA-Disease Association. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19, 2907-2919. https://doi.org/10.1109/TCBB.2021.3098126 |
[17] | Wei, H., Liao, Q. and Liu, B. (2020) iLnRNADIS-FB: Identify lncRNA-Disease Associations by Fusing Biological Feature Blocks through Deep Neural Network. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 18, 1946-1957. https://doi.org/10.1109/TCBB.2020.2964221 |
[18] | Lan, W., Li, M., Zhao, K., et al. (2017) LDAP: A Web Server for lncRNA-Disease Association Prediction. Bioinformatics, 33, 458-460. https://doi.org/10.1093/bioinformatics/btw639 |
[19] | Xie, G.B., Meng, T.F., Luo, Y. and Liu, Z.G. (2019) SKF-LDA: Similarity Kernel Fusion for Predicting lncRNA-Disease Association. Molecular Therapy Nucleic Acids, 18, 45-55. https://doi.org/10.1016/j.omtn.2019.07.022 |
[20] | Chen, G., Wang, Z.Y., Wang, D.Q., Qiu, C.X., Liu, M.X., Chen, X., Zhang, Q.P., Yan, G.Y. and Cui, Q.H. (2013) LncRNA Disease: A Database for Long-Non-Coding RNA-Associated Diseases. Nucleic Acids Research, 41, D983-D986. https://doi.org/10.1093/nar/gks1099 |
[21] | Fu, X., Cai, L., Zeng, X., et al. (2020) StackCPPred: A Stacking and Pairwise Energy Content-Based Prediction of Cell-Penetrating Peptides and Their Uptake Efficiency. Bioinformatics, 36, 3028-3034. https://doi.org/10.1093/bioinformatics/btaa131 |
[22] | Liang, X., Li, F., Chen, J., et al. (2021) Large-Scale Comparative Review and Assessment of Computational Methods for Anti-Cancer Peptide Identification. Briefings in Bioinformatics, 22, bbaa312. https://doi.org/10.1093/bib/bbaa312 |