Machine learning pipeline to train toxicity prediction model of FunTox-Networks

Name: Machine learning pipeline to train toxicity prediction model of FunTox-Networks
Creator: Zenodo
Published: 2020-07-30 08:26:25
License: 暂无描述

Zenodo2020-07-30 更新2026-05-25 收录

下载链接：

https://zenodo.org/record/3529161

下载链接

链接失效反馈

官方服务：

资源简介：

Machine Learning pipeline used to provide toxicity prediction in FunTox-Networks 01_DATA # preprocessing and filtering of raw activity data from ChEMBL - Chembl_v25 # latest activity assay data set from ChEMBL (retrieved Nov 2019) - filt_stats.R # Filtering and preparation of raw data - Filtered # output data sets from filt_stats.R - toxicity_direction.csv # table of toxicity measurements and their proportionality to toxicity 02_MolDesc # Calculation of molecular descriptors for all compounds within the filtered ChEMBL data set - datastore # files with all compounds and their calculated molecular descriptors based on SMILES - scripts - calc_molDesc.py # calculates for all compounds based on their smiles the molecular descriptors - chemopy-1.1 # used python package for descriptor calculation as decsribed in: https://doi.org/10.1093/bioinformatics/btt105 03_Averages # Calculation of moving averages for levels and organisms as required for calculation of Z-scores - datastore # output files with statistics calculated by make_Z.R - scripts -make_Z.R # script to calculate statistics to calculate Z-scores as used by the regression models 04_ZScores # Calculation of Z-scores and preparation of table to fit regression models - datastore # Z-normalized activity data and molecular descriptors in the form as used for fitting regression models - scripts -calc_Ztable.py # based on activity data, molecular descriptors and Z-statistics, the learning data is calculated 05_Regression # Performing regression. Preparation of data by removing of outliers based on a linear regression model. Learning of random forest regression models. Validation of learning process by cross validation and tuning of hyperparameters. - datastore # storage of all random forest regression models and average level of Z output value per level and organism (zexp_*.tsv) - scripts - data_preperation.R # set up of regression data set, removal of outliers and optional removal of fields and descriptors - Rforest_CV.R # analysis of machine learning by cross validation, importance of regression variables and tuning of hyperparameters (number of trees, split of variables) - Rforest.R # based on analysis of Rforest_CV.R learning of final models rregrs_output # early analysis of regression model performance with the package RRegrs as described in: https://doi.org/10.1186/s13321-015-0094-2

提供机构：

Zenodo

创建时间：

2019-11-05

5,000+

优质数据集

54 个

任务类型

进入经典数据集