Prediction of Drug-Induced Nephrotoxicity Using Chemical Information and Transcriptomics Data
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Prediction_of_Drug-Induced_Nephrotoxicity_Using_Chemical_Information_and_Transcriptomics_Data/28988580
下载链接
链接失效反馈官方服务:
资源简介:
Prediction of drug-induced nephrotoxicity
is an important task
in the drug discovery and development pipeline. Chemical information-based
machine learning models are used in general for nephrotoxicity prediction
as a part of computational modeling. Currently, gene expression data
are being considered increasingly for prediction of different toxicities,
as they can provide mechanistic understanding by which the drug causes
specific organ toxicity. Here, we demonstrate the use of gene expression
data for nephrotoxicity prediction using multiple machine learning
methods such as LightGBM, random forest, support vector machine, and
XGBoost. Apart from the models built with all the gene expression
profiles for selected compounds, the sample selection technique is
used to select three different subsets of gene expression profiles
of sizes 6000, 9000, and 12,000 and models are generated using them
also. Considering the imbalanced class distribution in gene expression
data, different techniques such as optimal probability thresholds
determination, data balancing, and cost-sensitive learning are considered
during model generation. We have also generated chemical information-based
models to compare the performance of gene expression-based models.
Multiple data division techniques are applied to enhance the performance
of chemical information-based models. The best chemical information-based
model (CIM19) and best gene expression-based model (GEM9) (generated
without any data balancing techniques) have similar AUC values of
0.89 and 0.9, respectively. To further enhance the performance of
gene expression-based models, we have developed a model GEM20 with
all the 6162 toxic gene expression profiles and the same number of
nontoxic profiles selected using the SPXY method from 18,825 nontoxic
profiles. This model provides the highest AUC score of 0.94 among
all of the chemical information- and gene expression-based models.
Additionally, SHAP analysis has been performed on a gene expression-based
model and identified several genes such as cell division cycle 20,
RPS6, DNA damage-inducible transcript 4, GAPDH, CCNF, and MRPL12,
which could be associated with nephrotoxicity.
创建时间:
2025-05-09



