five

aaronfeller/PeptideMTR_pretraining_data

收藏
Hugging Face2025-12-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/aaronfeller/PeptideMTR_pretraining_data
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit tags: - chemistry - biology pretty_name: PeptideMTR Pretraining Data size_categories: - 100M<n<1B --- # PeptideMTR Training Data This repository contains the dataset for the **PeptideMTR** paper. It is designed for SMILES encoder models trained by masked-language modeling (MLM) and/or multi-target regression (MTR) tasks, focusing on mapping peptide sequences to biochemical properties. Link to the manuscript will be added here when available. ## Dataset Summary The dataset includes peptide sequences paired with **99 RDKit-derived descriptors** representing various physicochemical properties (e.g., molecular weight, LogP, surface area, and charge descriptors). ## Data Structure * `SMILES`: The SMILES representation of the molecule. * `descriptors`: 99 continuous numerical features generated via RDKit. ## Usage To use this dataset with the Hugging Face `datasets` library: ```python from datasets import load_dataset ds = load_dataset("your-username/PeptideMTR_training_data") ```
提供机构:
aaronfeller
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作