five

Machine learning pipeline to train toxicity prediction model of FunTox-Networks

收藏
Zenodo2020-07-30 更新2026-05-25 收录
下载链接:
https://zenodo.org/record/3529161
下载链接
链接失效反馈
官方服务:
资源简介:
Machine Learning pipeline used to provide toxicity prediction in FunTox-Networks 01_DATA # preprocessing and filtering of raw activity data from ChEMBL<br> - Chembl_v25 # latest activity assay data set from ChEMBL (retrieved Nov 2019)<br> - filt_stats.R # Filtering and preparation of raw data<br> - Filtered # output data sets from filt_stats.R<br> - toxicity_direction.csv # table of toxicity measurements and their proportionality to toxicity 02_MolDesc # Calculation of molecular descriptors for all compounds within the filtered ChEMBL data set<br> - datastore # files with all compounds and their calculated molecular descriptors based on SMILES<br> - scripts<br> - calc_molDesc.py # calculates for all compounds based on their smiles the molecular descriptors<br> - chemopy-1.1 # used python package for descriptor calculation as decsribed in: https://doi.org/10.1093/bioinformatics/btt105 03_Averages # Calculation of moving averages for levels and organisms as required for calculation of Z-scores<br> - datastore # output files with statistics calculated by make_Z.R<br> - scripts<br> -make_Z.R # script to calculate statistics to calculate Z-scores as used by the regression models<br> <br> 04_ZScores # Calculation of Z-scores and preparation of table to fit regression models<br> - datastore # Z-normalized activity data and molecular descriptors in the form as used for fitting regression models<br> - scripts<br> -calc_Ztable.py # based on activity data, molecular descriptors and Z-statistics, the learning data is calculated 05_Regression # Performing regression. Preparation of data by removing of outliers based on a linear regression model. Learning of random forest regression models. Validation of learning process by cross validation and tuning of hyperparameters. - datastore # storage of all random forest regression models and average level of Z output value per level and organism (zexp_*.tsv)<br> - scripts<br> - data_preperation.R # set up of regression data set, removal of outliers and optional removal of fields and descriptors<br> - Rforest_CV.R # analysis of machine learning by cross validation, importance of regression variables and tuning of hyperparameters (number of trees, split of variables)<br> - Rforest.R # based on analysis of Rforest_CV.R learning of final models rregrs_output<br> # early analysis of regression model performance with the package RRegrs as described in: https://doi.org/10.1186/s13321-015-0094-2
提供机构:
Zenodo
创建时间:
2019-11-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作