基于人工智能的肺癌NOG/PDX模型驱动基因匹配预测

何雅億 , 郭皓越 , 刁丽 , 陈宇 , 朱俊杰 , Hiran C. Fernando , Diego Gonzalez Rivas , 祁辉 , Chunlei Dai , 汤旭蓁 , 朱军 , 戴家威 , 何侃 , Dan Chan , 杨洋

工程(英文) ›› 2022, Vol. 15 ›› Issue (8) : 102 -114.

PDF (3474KB)
工程(英文) ›› 2022, Vol. 15 ›› Issue (8) : 102 -114. DOI: 10.1016/j.eng.2021.06.017
研究论文

基于人工智能的肺癌NOG/PDX模型驱动基因匹配预测

作者信息 +

Prediction of Driver Gene Matching in Lung Cancer NOG/PDX Models Based on Artificial Intelligence

Author information +
文章历史 +
PDF (3557K)

摘要

患者源性肿瘤异种移植物(PDX)是癌症药物发现和筛查的有力工具。然而,目前的研究对PDX的基因型错配知之甚少,导致PDX使用过程中产生巨大的经济损失。在此,本研究建立了53 例肺癌患者的PDX模型,基因型匹配率为79.2%(42/53)。此外,检查了17 个临床病理学特征,并基于最低赤池信息量准则(AIC)、最小绝对收缩和选择算子(LASSO)-逻辑回归(LR)、支持向量机(SVM)递归特征消除(SVM-RFE)、极端梯度增强(XGBoost)、梯度增强和分类特征(CatBoost),以及合成少数过采样技术(SMOTE)输入逐步逻辑回归模型。最后,通过100 个试验组的准确度、受试者工作特征曲线下面积(AUC)和F1 评分评价所有模型的性能。两个多变量 LR 模型显示,年龄、驱动基因突变的数量、表皮生长因子受体(EGFR)基因突变、既往化疗的类型、既往酪氨酸激酶抑制剂(TKI)治疗和样本来源是强有力的预测因素。此外,CatBoost (平均精度= 0.960;平均AUC = 0.939;平均F1 分数= 0.908)和八特征SVM-RFE(平均精度= 0.950;平均AUC = 0.934;平均F1 分数= 0.903)在算法中表现出最好的性能。同时,除CatBoost 外,SMOTE的应用提高了大多数模型的预测能力。基于SMOTE,单一模型的集成分类器达到了最高的准确度(平均值= 0.975)、AUC(平均值= 0.949)和F1 评分(平均值= 0.938)。总之,本文建立了一个最佳预测模型来筛选肺癌患者的NOD/Shi-scid白细胞介素-2受体(IL-2R) γnull(NOG)/PDX模型,并为建立预测模型提供了一种通用方法。

Abstract

Patient-derived tumor xenografts (PDXs) are a powerful tool for drug discovery and screening in cancer. However, current studies have led to little understanding of genotype mismatches in PDXs, leading to massive economic losses. Here, we established PDX models from 53 lung cancer patients with a genotype matching rate of 79.2% (42/53). Furthermore, 17 clinicopathological features were examined and input in stepwise logistic regression (LR) models based on the lowest Akaike information criterion (AIC), least absolute shrinkage and selection operator (LASSO)-LR, support vector machine (SVM) recursive feature elimination (SVM-RFE), extreme gradient boosting (XGBoost), gradient boosting and categorical features (CatBoost), and the synthetic minority oversampling technique (SMOTE). Finally, the performance of all models was evaluated by the accuracy, area under the receiver operating characteristic curve (AUC), and F1 score in 100 testing groups. Two multivariable LR models revealed that age, number of driver gene mutations, epidermal growth factor receptor (EGFR) gene mutations, type of prior chemotherapy, prior tyrosine kinase inhibitor (TKI) therapy, and the source of the sample were powerful predictors. Moreover, CatBoost (mean accuracy = 0.960; mean AUC = 0.939; mean F1 score = 0.908) and the eight-feature SVM (mean accuracy = 0.950; mean AUC = 0.934; mean F1 score = 0.903) showed the best performance among the algorithms. Meanwhile, application of the SMOTE improved the predictive capability of most models, except CatBoost. Based on the SMOTE, the ensemble classifier of single models achieved the highest accuracy (mean = 0.975), AUC (mean = 0.949), and F1 score (mean = 0.938). In conclusion, we established an optimal predictive model to screen lung cancer patients for NOD/Shi-scid, interleukin-2 receptor (IL-2R) γnull (NOG)/PDX models and offer a general approach for building predictive models.

关键词

机器学习 / 患者源性肿瘤异种移植物 / NOG小鼠

Key words

Machine learning / Patient-derived tumor xenografts / NOG mice

引用本文

引用格式 ▾
何雅億,郭皓越,刁丽,陈宇,朱俊杰,Hiran C. Fernando,Diego Gonzalez Rivas,祁辉,Chunlei Dai,汤旭蓁,朱军,戴家威,何侃,Dan Chan,杨洋. 基于人工智能的肺癌NOG/PDX模型驱动基因匹配预测[J]. 工程(英文), 2022, 15(8): 102-114 DOI:10.1016/j.eng.2021.06.017

登录浏览全文

4963

注册一个新账户 忘记密码

参考文献

基金资助

()

AI Summary AI Mindmap
PDF (3474KB)

Supplementary files

Supplementary Material

1425

访问

0

被引

详细

导航
相关文章

AI思维导图

/