%0 Journal Article %T 基于宏基因组分析的机器学习疾病预测模型构建
Construction of Machine Learning Disease Prediction Model Based on Macro-Genomic Analysis %A 张钰东 %J Advances in Applied Mathematics %P 199-207 %@ 2324-8009 %D 2024 %I Hans Publishing %R 10.12677/AAM.2024.131023 %X 随着高通量测序技术的发展,宏基因组数据库得到了极大的丰富,为利用其分析人类疾病与健康状况提供了可能,其中基于人类肠道微生物组分析的疾病预测成为了代表性研究方向之一。本文利用以门为单位的分类学肠道微生物数据,即操作分类单元数据,结合非负矩阵分解和变分自动编码器方法,提出了两类新的机器学习分类算法,这些算法旨在提取肠道微生物中的关键信息,以实现对疾病患者的预测。通过降维、数据生成以及引入惩罚约束项等技术手段,我们改善了预测效果、优化了模型的过拟合。在模拟数据、肝硬化数据和糖尿病数据上,我们的预测模型均表现出了较好的性能,AUC值分别达到了0.926、0.956和0.745。
With the advancements in high-throughput sequencing technologies, the macro-genomic databases have significantly expanded, offering possibilities for analyzing human health and diseases. Among these possibilities, disease prediction based on the analysis of the human gut microbiota has be-come a prominent research avenue. In this study, we utilized taxonomic gut microbiota data at the phylum level, known as Operational Taxonomic Units (OTU) data, and introduced two novel ma-chine learning classification algorithms by combining non-negative matrix factorization and varia-tional autoencoder methods. These algorithms are designed to extract critical information from the gut microbiota to predict diseases in patients. Through techniques such as dimensionality reduc-tion, data generation, and the incorporation of penalty constraints in the models, we improve the prediction effect and optimize the overfitting of the model. Across simulated data, liver cirrhosis data, and diabetes data, our predictive models demonstrated significant performance, achieving AUC values of 0.926, 0.959, and 0.745, respectively. %K 操作分类单元,非负矩阵分解,变分自动编码器
Operational Taxonomic Units %K Non-Negative Matrix Factorization %K Variational Auto Encoder %U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=79316