aaronfeller/PeptideMTR_pretraining_data
收藏Hugging Face2025-12-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/aaronfeller/PeptideMTR_pretraining_data
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
tags:
- chemistry
- biology
pretty_name: PeptideMTR Pretraining Data
size_categories:
- 100M<n<1B
---
# PeptideMTR Training Data
This repository contains the dataset for the **PeptideMTR** paper. It is designed for SMILES encoder models trained by masked-language modeling (MLM) and/or multi-target regression (MTR) tasks, focusing on mapping peptide sequences to biochemical properties.
Link to the manuscript will be added here when available.
## Dataset Summary
The dataset includes peptide sequences paired with **99 RDKit-derived descriptors** representing various physicochemical properties (e.g., molecular weight, LogP, surface area, and charge descriptors).
## Data Structure
* `SMILES`: The SMILES representation of the molecule.
* `descriptors`: 99 continuous numerical features generated via RDKit.
## Usage
To use this dataset with the Hugging Face `datasets` library:
```python
from datasets import load_dataset
ds = load_dataset("your-username/PeptideMTR_training_data")
```
提供机构:
aaronfeller



