five

Prediction of Drug-Induced Nephrotoxicity Using Chemical Information and Transcriptomics Data

收藏
Figshare2025-05-09 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Prediction_of_Drug-Induced_Nephrotoxicity_Using_Chemical_Information_and_Transcriptomics_Data/28988580
下载链接
链接失效反馈
官方服务:
资源简介:
Prediction of drug-induced nephrotoxicity is an important task in the drug discovery and development pipeline. Chemical information-based machine learning models are used in general for nephrotoxicity prediction as a part of computational modeling. Currently, gene expression data are being considered increasingly for prediction of different toxicities, as they can provide mechanistic understanding by which the drug causes specific organ toxicity. Here, we demonstrate the use of gene expression data for nephrotoxicity prediction using multiple machine learning methods such as LightGBM, random forest, support vector machine, and XGBoost. Apart from the models built with all the gene expression profiles for selected compounds, the sample selection technique is used to select three different subsets of gene expression profiles of sizes 6000, 9000, and 12,000 and models are generated using them also. Considering the imbalanced class distribution in gene expression data, different techniques such as optimal probability thresholds determination, data balancing, and cost-sensitive learning are considered during model generation. We have also generated chemical information-based models to compare the performance of gene expression-based models. Multiple data division techniques are applied to enhance the performance of chemical information-based models. The best chemical information-based model (CIM19) and best gene expression-based model (GEM9) (generated without any data balancing techniques) have similar AUC values of 0.89 and 0.9, respectively. To further enhance the performance of gene expression-based models, we have developed a model GEM20 with all the 6162 toxic gene expression profiles and the same number of nontoxic profiles selected using the SPXY method from 18,825 nontoxic profiles. This model provides the highest AUC score of 0.94 among all of the chemical information- and gene expression-based models. Additionally, SHAP analysis has been performed on a gene expression-based model and identified several genes such as cell division cycle 20, RPS6, DNA damage-inducible transcript 4, GAPDH, CCNF, and MRPL12, which could be associated with nephrotoxicity.
创建时间:
2025-05-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作