%0 Journal Article %T 基于机器学习的抗乳腺癌药物活性预测模型
Prediction Model of Anti-Breast Cancer Drug Activity Based on Machine Learning %A 贺冰 %A 甘俊毅 %J Computer Science and Application %P 298-307 %@ 2161-881X %D 2024 %I Hans Publishing %R 10.12677/CSA.2024.142030 %X 乳腺癌是一种由乳腺上皮细胞不受控制地增殖,并最终导致恶性变化的疾病。作为女性最常见的恶性肿瘤之一,乳腺癌的发病率与雌激素受体密切相关。雌激素受体α亚型(ERα)被视为治疗乳腺癌的关键靶标,因此能够拮抗ERα活性的化合物被认为是潜在的乳腺癌治疗药物。在药物研发阶段,建立化合物分子结构描述符与生物活性值得的定量关系模型对指导新药物的设计和优化具有重要意义。这不仅可以节约研发资源,还有望加速新药物的上市进程,为乳腺癌治疗领域的研究和开发提供重要的支持。然而,化合物分子描述符种类繁多,直接进行活性预测效果不佳。因此,本文首先提出了一种分子描述符筛选方法,使用基于最大期望算法的高斯混合模型进行分子描述符分布检测,接着根据少量化合物分子描述符和生物活性的对应关系,使用随机森林降维算法选取候选分子描述符组,最后使用斯皮尔曼相关性系数计算剔除掉相关性较高的分子描述符,最终得到对生物活性影响大且相关性低的分子描述符组。接着,本文使用随机森林回归算法预测药物活性,并创新地利用遗传算法求解映射函数,对原始回归目标进行均衡化处理。在化合物ERα生物活性数据集上的实验表明,我们的生物活性预测模型取得了很好的效果。
Breast cancer is a disease characterized by uncontrolled proliferation of epithelial cells in the breast, leading to malignant transformation. As one of the most common malignancies in women, the incidence of breast cancer is closely associated with estrogen receptors. The estrogen receptor alpha subtype (ERα) is considered a crucial target for treating breast cancer, and compounds capable of antagonizing ERα activity are regarded as potential therapeutic agents. During the drug de-velopment phase, establishing a quantitative relationship model between compound molecular structure descriptors and bioactivity values is of paramount importance in guiding the design and optimization of new drugs. This not only conserves research and development resources but also holds the promise of expediting the market entry of new drugs, providing essential support for re-search and development in the field of breast cancer treatment. However, due to the diverse types of compound molecular descriptors, direct prediction of activity yields suboptimal results. There-fore, we introduce a molecular descriptor screening method. It employs a Gaussian mixture model based on the Expectation-Maximization algorithm for molecular descriptor distribution detection. Subsequently, based on the correspondence between a small number of compound molecular descriptors and bioactivity, a random forest dimensionality reduction algorithm is used to select a candidate set of molecular descriptors. Finally, the Spearman correlation coefficient is employed to eliminate highly correlated descriptors, resulting in a set of molecular descriptors with significant impact on bioactivity and low correlation. Next, we utilize a random forest regression algorithm for predicting drug activity and innovatively employ a genetic algorithm to solve the mapping function, balancing the original regression target. Experimental results on a dataset of ERα bioactivity of compounds demonstrate the effectiveness of our bioactivity %K 随机森林,遗传算法,药物活性预测
Random Forest %K Genetic Algorithm %K Drug Activity Prediction %U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=81304