Learning and actioning general principles of cancer cell drug sensitivity

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE287932

下载链接

链接失效反馈

官方服务：

资源简介：

High-throughput screening platforms for the profiling of drug sensitivity of hundreds of cancer cell lines (CCLs) have generated large datasets that hold the potential to unlock targeted, anti-tumor therapies. In this study, we leveraged these datasets to create predictive models of cancer cells drug sensitivity. To this aim we trained explainable machine learning algorithms by employing cell line transcriptomics to predict the growth inhibitory potential of drugs. We used large language models (LLMs) to expand descriptions of the mechanisms of action (MOA) for each drug starting from available annotations, which were matched to the semantically closest pathways from reference knowledge bases. By leveraging this AI-curated resource, and the interpretability of our model, we demonstrated that pathways enriched for genes crucial for prediction often matched known drug-MOAs and essential genes, suggesting that our models learned the molecular determinants of drug response. Furthermore, we demonstrated that by incorporating only LLM-curated genes associated with MOAs, we enhanced the predictive accuracy of our drug models. To enhance translatability to a clinical setting, we employed a pipeline to align bulk RNAseq from CCLs, used for training the models, to those from patient samples, used for inference. We proved the effectiveness of our approach on TCGA samples, where patients’ best scoring drugs matched those prescribed for their cancer type. We further showed its usefulness by predicting and experimentally validating effective drugs for the patients of two highly lethal solid tumors, i.e. pancreatic cancer and glioblastoma. In summary, our method facilitates the inference and interpretation of cancer cell line drug sensitivity and holds potential to effectively translate them into new cancer therapeutics. Development of interpretable machine learning framework for the prediction of drug sensitivity of cancer cell lines by using the latest version of the Genomics of Drug Sensitivity in Cancer (GDSC) dataset

用于对数百种癌细胞系（cancer cell lines, CCLs）的药物敏感性进行谱分析的高通量筛选平台，已生成海量数据集，此类数据集具备解锁靶向抗肿瘤治疗方案的潜力。本研究利用此类数据集构建了癌细胞药物敏感性预测模型。为此，我们采用细胞系转录组学数据训练可解释机器学习算法，以预测药物的生长抑制潜力。我们借助大语言模型（Large Language Models, LLMs），基于现有注释拓展了每种药物的作用机制（Mechanism of Action, MOA）描述，并将其与参考知识库中语义最相近的通路进行匹配。通过利用这一经人工智能精选整理的资源，并结合模型的可解释性，我们证实，对预测至关重要的基因富集通路往往与已知的药物-MOA和必需基因相契合，表明我们的模型已学习到药物响应的分子决定因素。此外，我们证实，仅纳入与MOA相关的经人工智能精选整理的基因，即可提升药物敏感性预测模型的预测精度。为提升向临床场景的可迁移性，我们采用了一套分析流程，将训练模型所用的癌细胞系批量RNA测序（bulk RNAseq）数据，与用于推理的患者样本数据进行对齐。我们在癌症基因组图谱（The Cancer Genome Atlas, TCGA）样本中验证了该方法的有效性，结果显示患者评分最高的药物与针对其癌症类型开具的处方药物相匹配。我们还通过预测并实验验证了两种高致死性实体瘤——胰腺癌与胶质母细胞瘤患者的有效药物，进一步证实了该方法的实用性。综上，本方法可实现癌细胞系药物敏感性的预测与解释，并有望将其有效转化为新型癌症治疗手段。本研究开发了一种可解释机器学习框架，用于基于最新版《癌症药物敏感性基因组学》（Genomics of Drug Sensitivity in Cancer, GDSC）数据集预测癌细胞系的药物敏感性。

创建时间：

2025-02-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集