全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于宏基因组分析的机器学习疾病预测模型构建
Construction of Machine Learning Disease Prediction Model Based on Macro-Genomic Analysis

DOI: 10.12677/AAM.2024.131023, PP. 199-207

Keywords: 操作分类单元,非负矩阵分解,变分自动编码器
Operational Taxonomic Units
, Non-Negative Matrix Factorization, Variational Auto Encoder

Full-Text   Cite this paper   Add to My Lib

Abstract:

随着高通量测序技术的发展,宏基因组数据库得到了极大的丰富,为利用其分析人类疾病与健康状况提供了可能,其中基于人类肠道微生物组分析的疾病预测成为了代表性研究方向之一。本文利用以门为单位的分类学肠道微生物数据,即操作分类单元数据,结合非负矩阵分解和变分自动编码器方法,提出了两类新的机器学习分类算法,这些算法旨在提取肠道微生物中的关键信息,以实现对疾病患者的预测。通过降维、数据生成以及引入惩罚约束项等技术手段,我们改善了预测效果、优化了模型的过拟合。在模拟数据、肝硬化数据和糖尿病数据上,我们的预测模型均表现出了较好的性能,AUC值分别达到了0.926、0.956和0.745。
With the advancements in high-throughput sequencing technologies, the macro-genomic databases have significantly expanded, offering possibilities for analyzing human health and diseases. Among these possibilities, disease prediction based on the analysis of the human gut microbiota has be-come a prominent research avenue. In this study, we utilized taxonomic gut microbiota data at the phylum level, known as Operational Taxonomic Units (OTU) data, and introduced two novel ma-chine learning classification algorithms by combining non-negative matrix factorization and varia-tional autoencoder methods. These algorithms are designed to extract critical information from the gut microbiota to predict diseases in patients. Through techniques such as dimensionality reduc-tion, data generation, and the incorporation of penalty constraints in the models, we improve the prediction effect and optimize the overfitting of the model. Across simulated data, liver cirrhosis data, and diabetes data, our predictive models demonstrated significant performance, achieving AUC values of 0.926, 0.959, and 0.745, respectively.

References

[1]  Sommer, F., Jacqueline, M., Richa, B., Jeroen, R. and Philip, R. (2017) The Resilience of the Intestinal Microbiota In-fluences Health and Disease. Nature Reviews Microbiology, 15, 630-638.
https://doi.org/10.1038/nrmicro.2017.58
[2]  Jackson, A.M., Verdi, S., Maxan, M.E., Shin, C.M., Zierer, J., Bowyer, R., Martin, T., Williams, F., Menni, C., Bell, J., Spector, T. and Steves, C. (2018) Gut Microbiota Associations with Common Diseases and Prescription Medications in a Population-Based Cohort. NatCommun, 9, Article No. 2655.
https://doi.org/10.1038/s41467-018-05184-7
[3]  Blaxter, M., Mann, J., Chapman, T., Thomas, F., Whitton, C., Floyd, R. and Abebe, E. (2005) Defining Operational Taxonomic Units Using DNA Barcode Data. Philosophical Transactions of the Royal Society B, 360, 1935-1943.
https://doi.org/10.1098/rstb.2005.1725
[4]  Tsai, K., Lin, S., Liu, W. and Wang, D. (2015) Inferring Microbial In-teraction Network from Microbiome Data Using RMN Algorithm. BMC System Biology, 9, Article No. 54.
https://doi.org/10.1186/s12918-015-0199-2
[5]  Krizhevsky, A., Sutskever, I. and Hinton, G. (2012) Imagenet Classification with Deep Convolutional Neural Networks. Communications of the ACM, 60, 84-90.
https://doi.org/10.1145/3065386
[6]  Tsang, M., Cheng, D. and Liu, Y. (2007) Detecting Statistical Interactions from Neural Network Weights. arXiv:1705.04977.
[7]  Bokulich, N., Dillon, M., Bolyen, E., Kaehler, B. and Huttley, G. (2018) q2-Sample-Classifier: Machine-Learning Tools for Microbiome Classification and Regression. Journal of Open Research Software, 3, Article 934.
https://doi.org/10.21105/joss.00934
[8]  Lo, C. and Marculescu, R. (2019) MetaNN: Accurate Classification of Host Phenotypes from Metagenomic Data Using Neural Networks. BMC Bioinformatics, 20, Article No. 314.
https://doi.org/10.1186/s12859-019-2833-2
[9]  Sharma, D., Paterson, A., Xu, W. (2020) TaxoNN: Ensemble of Neural Networks on Stratified Microbiome Data for Disease Prediction. Bioinformatics, 36, 4544-4550.
https://doi.org/10.1093/bioinformatics/btaa542
[10]  Lee, D. and Seung, H. (1999) Learning the Parts of Objects by Nonnegative Matrix Factorization. Nature, 401, 788-791.
https://doi.org/10.1038/44565
[11]  Karthik, D. (2008) Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology. PLOS Computational Biology, 4, e1000029.
https://doi.org/10.1371/journal.pcbi.1000029
[12]  Qin, J., et al. (2012) A Metagenome-Wide Association Study of Gut Microbiota in Type 2 Diabetes. Nature, 490, 55-60.
https://doi.org/10.1038/nature11450
[13]  Qin, N., et al. (2014) Alterations of the Human Gut Microbiome in Liver-cirrhosis. Nature, 513, 59-64.
https://doi.org/10.1038/nature13568
[14]  Turpin, W., Sliverberg, M., Kevans, D., Smith, M., et al. (2016) Associa-tion of Host Genome with Intestinal Microbial Composition in a Large Healthy Cohort. Nature Genetics, 48, 1413-1417.
https://doi.org/10.1038/ng.3693
[15]  Kong, D., Ding, C. and Huang, H. (2011) Robust Nonnegative Matrix Fac-torization Using L21-Norm. Proceedings of the 20th ACM International Conference on Information and Knowledge Management, 673-682.
https://doi.org/10.1145/2063576.2063676
[16]  Wang, D., Liu, J., Gao, Y., Zheng, C. and Xu, Y. (2016) An NMF-l2,1-Norm Constraint Method for Characteristic Gene Secection. PLOS ONE, 11, e0158494.
https://doi.org/10.1371/journal.pone.0158494
[17]  Wang, Y., Yao, H. and Zhao, S. (2016) Auto-Encoder Based Dimensionality Reduction. Neurcomputing, 184, 232-242.
https://doi.org/10.1016/j.neucom.2015.08.104
[18]  Kingma, D.P. and Welling, M. (2014) Auto-Encoding Varia-tionalbayes. arXiv:1312.6114.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413