five

maom/Zhang2023_DEL_sEH

收藏
Hugging Face2024-06-07 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/maom/Zhang2023_DEL_sEH
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: en license: mit source_datasets: experimental task_categories: - tabular-regression tags: - chemistry - biology - medical pretty_name: DNA Encoded Library Screen against Souble Epoxide Hydrolase (sEH) dataset_summary: >- A screen of three-cycle OpenDEL libraries against soluble epoxide hydrolase (sEH). citation: |- @article{Zhang2023, title = {Building Block-Based Binding Predictions for DNA-Encoded Libraries}, volume = {63}, ISSN = {1549-960X}, url = {http://dx.doi.org/10.1021/acs.jcim.3c00588}, DOI = {10.1021/acs.jcim.3c00588}, number = {16}, journal = {Journal of Chemical Information and Modeling}, publisher = {American Chemical Society (ACS)}, author = {Zhang, Chris and Pitman, Mary and Dixit, Anjali and Leelananda, Sumudu and Palacci, Henri and Lawler, Meghan and Belyanskaya, Svetlana and Grady, LaShadric and Franklin, Joe and Tilmans, Nicolas and Mobley, David L.}, year = {2023}, month = aug, pages = {5120–5132} } size_categories: - 1M<n<10M config_names: - sEH_regression configs: - config_name: sEH_regression data_files: - split: test path: sEH_regression/test.csv - split: train path: sEH_regression/train.csv dataset_info: - config_name: sEH_regression features: - name: structure dtype: string - name: read_count dtype: int32 - name: bb1 dtype: string - name: bb2 dtype: string - name: bb3 dtype: string - name: bb1_iso dtype: string - name: bb2_iso dtype: string - name: bb3_iso dtype: string splits: - name: train num_bytes: num_examples: - name: test num_bytes: num_examples: --- EDIT BELOW HERE \/ # Attentive Skin To Predict Skin Corrosion/Irritation Potentials of Chemicals via Explainable Machine Learning Methods Download: https://github.com/BeeBeeWong/AttentiveSkin/releases/tag/v1.0 ## Quickstart Usage ### Load a dataset in python Each subset can be loaded into python using the Huggingface [datasets](https://huggingface.co/docs/datasets/index) library. First, from the command line install the `datasets` library $ pip install datasets then, from within python load the datasets library >>> import datasets and load one of the `AttentiveSkin` datasets, e.g., >>> Corr_Neg = datasets.load_dataset("maomlab/AttentiveSkin", name = 'Corr_Neg') Downloading readme: 100%|██████████| 64.0k/64.0k [00:00<00:00, 11.7kkB/s] Downloading data: 100%|██████████| 1.02M/1.02M [00:00<00:00, 4.88MkB/s] Generating test split: 100%|██████████| 181/181 [00:00<00:00, 3189.72examples/s] Generating train split: 100%|██████████| 1755/1755 [00:00<00:00, 19806.87examples/s] and inspecting the loaded dataset >>> Corr_Neg DatasetDict({ test: Dataset({ features: ['Name', 'Synonym', 'CAS RN', 'GHS', 'Detailed Page', 'Evidence', 'OECD TG 404', 'Data Source', 'Frequency', 'SMILES', 'SMILES URL', 'SMILES Source', 'Canonical SMILES', 'Split'], num_rows: 181 }) train: Dataset({ features: ['Name', 'Synonym', 'CAS RN', 'GHS', 'Detailed Page', 'Evidence', 'OECD TG 404', 'Data Source', 'Frequency', 'SMILES', 'SMILES URL', 'SMILES Source', 'Canonical SMILES', 'Split'], num_rows: 1755 }) }) ### Use a dataset to train a model One way to use the dataset is through the [MolFlux](https://exscientia.github.io/molflux/) package developed by Exscientia. First, from the command line, install `MolFlux` library with `catboost` and `rdkit` support pip install 'molflux[catboost,rdkit]' then load, featurize, split, fit, and evaluate the catboost model import json from datasets import load_dataset from molflux.datasets import featurise_dataset from molflux.features import load_from_dicts as load_representations_from_dicts from molflux.splits import load_from_dict as load_split_from_dict from molflux.modelzoo import load_from_dict as load_model_from_dict from molflux.metrics import load_suite split_dataset = load_dataset('maomlab/AttentiveSkin', name = 'Corr_Neg') split_featurised_dataset = featurise_dataset( split_dataset, column = "SMILES", representations = load_representations_from_dicts([{"name": "morgan"}, {"name": "maccs_rdkit"}])) model = load_model_from_dict({ "name": "cat_boost_classifier", "config": { "x_features": ['SMILES::morgan', 'SMILES::maccs_rdkit'], "y_features": ['GHS']}}) model.train(split_featurised_dataset["train"]) preds = model.predict(split_featurised_dataset["test"]) classification_suite = load_suite("classification") scores = classification_suite.compute( references=split_featurised_dataset["test"]['GHS'], predictions=preds["cat_boost_classifier::GHS"]) ### Data splits Here we have used the Realistic Split method described in [(Martin et al., 2018)](https://doi.org/10.1021/acs.jcim.7b00166) ## AttentiveSkin To Predict Skin Corrosion/Irritation Potentials of Chemicals via Explainable Machine Learning Methods Download: https://github.com/BeeBeeWong/AttentiveSkin/releases/tag/v1.0 ## Tutorial ### Basic: AttentiveSkin is a software used for predicting GHS-defined (the Globally Harmonized System of Classification and Labeling of Chemicals) Skin Corrosion/Irritation labels of chemicals. Download and unzip the "AttentiveSkin_v1.0.zip" at the URL above. Place the file "AttentiveSkin.exe" and dir "dependency" in the same directory. Launch the "AttentiveSkin.exe" and wait until the GUI being initialized. ### Input: The input SMILES can be listed to the first column in .txt or .tsv files. User can follow the manner of example in "./example/input.txt". Click the button "Input" to open the text file containing input SMILES. ### Output: The interpretable prediction containing attetion weights will be placed in .html files, while basic info will be written to .xlsx files. Results of the two binary tasks (Corr vs Neg, Irrit vs Neg) are generated separately. Click the button "Output" to select the directory to store the prediction results. ## Citation Cite this: Chem. Res. Toxicol. 2024, 37, 2, 361–373 Publication Date:January 31, 2024 https://doi.org/10.1021/acs.chemrestox.3c00332 Copyright © 2024 American Chemical Society ## Contact Developer: Zejun Huang, incorrectwong11@gmail.com Corresponding author (Prof.): Yun Tang, ytang234@ecust.edu.cn
提供机构:
maom
原始信息汇总

数据集概述

数据集名称

  • DNA Encoded Library Screen against Soluble Epoxide Hydrolase (sEH)

数据集描述

  • 该数据集是对三周期OpenDEL库针对可溶性环氧化物水解酶(sEH)的筛选结果。

数据集应用领域

  • 化学
  • 生物学
  • 医学

任务类别

  • 表格回归

数据集配置

  • 配置名称: sEH_regression
  • 数据文件:
    • 训练集: sEH_regression/train.csv
    • 测试集: sEH_regression/test.csv

数据集特征

  • 结构: 字符串类型
  • 读取计数: 整数类型
  • bb1, bb2, bb3: 字符串类型
  • bb1_iso, bb2_iso, bb3_iso: 字符串类型

引用信息

  • 参考文献: Zhang, Chris et al. "Building Block-Based Binding Predictions for DNA-Encoded Libraries." Journal of Chemical Information and Modeling 63.16 (2023): 5120–5132. DOI: 10.1021/acs.jcim.3c00588.
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作