AI4Protein/EC_ESMFold
收藏Hugging Face2024-08-13 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/AI4Protein/EC_ESMFold
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-classification
tags:
- protein
- downstream task
---
# EC Dataset with ESMFold Structural Sequence
- Description: The Enzyme Commission number (EC number) is a numerical classification scheme for enzymes, based on the chemical reactions they catalyze.
- Number of labels: 585
- Problem Type: multi_label_classification
- Columns:
- aa_seq: protein amino acid sequence
- foldseek_seq: foldseek 20 3di structural sequence
- ss8_seq: DSSP 8 secondary structure sequence
# Github
Simple, Efficient and Scalable Structure-aware Adapter Boosts Protein Language Models
https://github.com/tyang816/SES-Adapter
# Citation
Please cite our work if you use our dataset.
```
@article{tan2024ses-adapter,
title={Simple, Efficient, and Scalable Structure-Aware Adapter Boosts Protein Language Models},
author={Tan, Yang and Li, Mingchen and Zhou, Bingxin and Zhong, Bozitao and Zheng, Lirong and Tan, Pan and Zhou, Ziyi and Yu, Huiqun and Fan, Guisheng and Hong, Liang},
journal={Journal of Chemical Information and Modeling},
year={2024},
publisher={ACS Publications}
}
```
---
许可证:Apache-2.0
任务类别:
- 文本分类
标签:
- 蛋白质
- 下游任务
---
# 搭载ESMFold结构序列的EC数据集
- 数据集说明:酶学委员会编号(Enzyme Commission Number,EC编号)是基于酶所催化的化学反应,对酶进行分类的数值分类方案。
- 标签数量:585
- 任务类型:多标签分类
- 数据字段:
- aa_seq:蛋白质氨基酸序列
- foldseek_seq:Foldseek 20 3di结构序列
- ss8_seq:DSSP 8态二级结构序列
# GitHub仓库
简易高效且可扩展的结构感知适配器助力蛋白质语言模型性能提升
https://github.com/tyang816/SES-Adapter
# 引用说明
若您使用本数据集,请引用我们的研究工作。
@article{tan2024ses-adapter,
title={Simple, Efficient, and Scalable Structure-Aware Adapter Boosts Protein Language Models},
author={Tan, Yang and Li, Mingchen and Zhou, Bingxin and Zhong, Bozitao and Zheng, Lirong and Tan, Pan and Zhou, Ziyi and Yu, Huiqun and Fan, Guisheng and Hong, Liang},
journal={Journal of Chemical Information and Modeling},
year={2024},
publisher={ACS Publications}
}
提供机构:
AI4Protein



