maom/Zhang2023_DEL_sEH
收藏Hugging Face2024-06-07 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/maom/Zhang2023_DEL_sEH
下载链接
链接失效反馈官方服务:
资源简介:
---
language: en
license: mit
source_datasets: experimental
task_categories:
- tabular-regression
tags:
- chemistry
- biology
- medical
pretty_name: DNA Encoded Library Screen against Souble Epoxide Hydrolase (sEH)
dataset_summary: >-
A screen of three-cycle OpenDEL libraries against soluble epoxide hydrolase
(sEH).
citation: |-
@article{Zhang2023,
title = {Building Block-Based Binding Predictions for DNA-Encoded Libraries},
volume = {63},
ISSN = {1549-960X},
url = {http://dx.doi.org/10.1021/acs.jcim.3c00588},
DOI = {10.1021/acs.jcim.3c00588},
number = {16},
journal = {Journal of Chemical Information and Modeling},
publisher = {American Chemical Society (ACS)},
author = {Zhang, Chris and Pitman, Mary and Dixit, Anjali and Leelananda, Sumudu and Palacci, Henri and Lawler, Meghan and Belyanskaya, Svetlana and Grady, LaShadric and Franklin, Joe and Tilmans, Nicolas and Mobley, David L.},
year = {2023},
month = aug,
pages = {5120–5132}
}
size_categories:
- 1M<n<10M
config_names:
- sEH_regression
configs:
- config_name: sEH_regression
data_files:
- split: test
path: sEH_regression/test.csv
- split: train
path: sEH_regression/train.csv
dataset_info:
- config_name: sEH_regression
features:
- name: structure
dtype: string
- name: read_count
dtype: int32
- name: bb1
dtype: string
- name: bb2
dtype: string
- name: bb3
dtype: string
- name: bb1_iso
dtype: string
- name: bb2_iso
dtype: string
- name: bb3_iso
dtype: string
splits:
- name: train
num_bytes:
num_examples:
- name: test
num_bytes:
num_examples:
---
EDIT BELOW HERE \/
# Attentive Skin
To Predict Skin Corrosion/Irritation Potentials of Chemicals via Explainable Machine Learning Methods
Download: https://github.com/BeeBeeWong/AttentiveSkin/releases/tag/v1.0
## Quickstart Usage
### Load a dataset in python
Each subset can be loaded into python using the Huggingface [datasets](https://huggingface.co/docs/datasets/index) library.
First, from the command line install the `datasets` library
$ pip install datasets
then, from within python load the datasets library
>>> import datasets
and load one of the `AttentiveSkin` datasets, e.g.,
>>> Corr_Neg = datasets.load_dataset("maomlab/AttentiveSkin", name = 'Corr_Neg')
Downloading readme: 100%|██████████| 64.0k/64.0k [00:00<00:00, 11.7kkB/s]
Downloading data: 100%|██████████| 1.02M/1.02M [00:00<00:00, 4.88MkB/s]
Generating test split: 100%|██████████| 181/181 [00:00<00:00, 3189.72examples/s]
Generating train split: 100%|██████████| 1755/1755 [00:00<00:00, 19806.87examples/s]
and inspecting the loaded dataset
>>> Corr_Neg
DatasetDict({
test: Dataset({
features: ['Name', 'Synonym', 'CAS RN', 'GHS', 'Detailed Page', 'Evidence', 'OECD TG 404', 'Data Source', 'Frequency', 'SMILES', 'SMILES URL', 'SMILES Source', 'Canonical SMILES', 'Split'],
num_rows: 181
})
train: Dataset({
features: ['Name', 'Synonym', 'CAS RN', 'GHS', 'Detailed Page', 'Evidence', 'OECD TG 404', 'Data Source', 'Frequency', 'SMILES', 'SMILES URL', 'SMILES Source', 'Canonical SMILES', 'Split'],
num_rows: 1755
})
})
### Use a dataset to train a model
One way to use the dataset is through the [MolFlux](https://exscientia.github.io/molflux/) package developed by Exscientia.
First, from the command line, install `MolFlux` library with `catboost` and `rdkit` support
pip install 'molflux[catboost,rdkit]'
then load, featurize, split, fit, and evaluate the catboost model
import json
from datasets import load_dataset
from molflux.datasets import featurise_dataset
from molflux.features import load_from_dicts as load_representations_from_dicts
from molflux.splits import load_from_dict as load_split_from_dict
from molflux.modelzoo import load_from_dict as load_model_from_dict
from molflux.metrics import load_suite
split_dataset = load_dataset('maomlab/AttentiveSkin', name = 'Corr_Neg')
split_featurised_dataset = featurise_dataset(
split_dataset,
column = "SMILES",
representations = load_representations_from_dicts([{"name": "morgan"}, {"name": "maccs_rdkit"}]))
model = load_model_from_dict({
"name": "cat_boost_classifier",
"config": {
"x_features": ['SMILES::morgan', 'SMILES::maccs_rdkit'],
"y_features": ['GHS']}})
model.train(split_featurised_dataset["train"])
preds = model.predict(split_featurised_dataset["test"])
classification_suite = load_suite("classification")
scores = classification_suite.compute(
references=split_featurised_dataset["test"]['GHS'],
predictions=preds["cat_boost_classifier::GHS"])
### Data splits
Here we have used the Realistic Split method described in [(Martin et al., 2018)](https://doi.org/10.1021/acs.jcim.7b00166)
## AttentiveSkin
To Predict Skin Corrosion/Irritation Potentials of Chemicals via Explainable Machine Learning Methods
Download: https://github.com/BeeBeeWong/AttentiveSkin/releases/tag/v1.0
## Tutorial
### Basic:
AttentiveSkin is a software used for predicting GHS-defined
(the Globally Harmonized System of Classification and Labeling of Chemicals) Skin Corrosion/Irritation labels of chemicals.
Download and unzip the "AttentiveSkin_v1.0.zip" at the URL above.
Place the file "AttentiveSkin.exe" and dir "dependency" in the same directory.
Launch the "AttentiveSkin.exe" and wait until the GUI being initialized.
### Input:
The input SMILES can be listed to the first column in .txt or .tsv files.
User can follow the manner of example in "./example/input.txt".
Click the button "Input" to open the text file containing input SMILES.
### Output:
The interpretable prediction containing attetion weights will be placed in .html files, while basic info will be written to .xlsx files.
Results of the two binary tasks (Corr vs Neg, Irrit vs Neg) are generated separately.
Click the button "Output" to select the directory to store the prediction results.
## Citation
Cite this:
Chem. Res. Toxicol. 2024, 37, 2, 361–373
Publication Date:January 31, 2024
https://doi.org/10.1021/acs.chemrestox.3c00332
Copyright © 2024 American Chemical Society
## Contact
Developer: Zejun Huang, incorrectwong11@gmail.com
Corresponding author (Prof.): Yun Tang, ytang234@ecust.edu.cn
提供机构:
maom
原始信息汇总
数据集概述
数据集名称
- DNA Encoded Library Screen against Soluble Epoxide Hydrolase (sEH)
数据集描述
- 该数据集是对三周期OpenDEL库针对可溶性环氧化物水解酶(sEH)的筛选结果。
数据集应用领域
- 化学
- 生物学
- 医学
任务类别
- 表格回归
数据集配置
- 配置名称: sEH_regression
- 数据文件:
- 训练集:
sEH_regression/train.csv - 测试集:
sEH_regression/test.csv
- 训练集:
数据集特征
- 结构: 字符串类型
- 读取计数: 整数类型
- bb1, bb2, bb3: 字符串类型
- bb1_iso, bb2_iso, bb3_iso: 字符串类型
引用信息
- 参考文献: Zhang, Chris et al. "Building Block-Based Binding Predictions for DNA-Encoded Libraries." Journal of Chemical Information and Modeling 63.16 (2023): 5120–5132. DOI: 10.1021/acs.jcim.3c00588.



