COGNANO/AVIDa-SARS-CoV-2
收藏Hugging Face2024-10-17 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/COGNANO/AVIDa-SARS-CoV-2
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
tags:
- biology
- protein
- antibody
- VHH
---
## AVIDa-SARS-CoV-2
AVIDa-SARS-CoV-2 is a dataset featuring the antigen-variable domain of heavy chain of heavy chain antibody (VHH) interactions obtained from two alpacas immunized with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike proteins.
AVIDa-SARS-CoV-2 includes binary labels indicating the binding or non-binding of diverse VHH sequences to 12 SARS-CoV-2 mutants, such as the Delta and Omicron variants.
Further details on AVIDa-SARS-CoV-2 are described in our paper "[A SARS-CoV-2 Interaction Dataset and VHH Sequence Corpus for Antibody Language Models](https://arxiv.org/abs/2405.18749).”
## Columns
**AVIDa-SARS-CoV-2.csv / train.csv / test.csv**
| Column | Description |
| --------------- | --------------------------------------------------------------------------------- |
| VHH_sequence | Amino acid sequence of VHH |
| Ag_label | Antigen Type |
| label | Binary label represented by 1 for the binding pair and 0 for the non-binding pair |
| subject_species | Species of the subject from which VHH was collected |
| subject_name | Name of the subject from which VHH was collected |
| subject_sex | Sex of the subject from which VHH was collected |
**antigen_sequences.csv**
| Column | Description |
| ----------- | ------------------------------ |
| Ag_label | Antigen Type |
| Ag_sequence | Amino acid sequence of antigen |
## Links
- Project Page: https://avida-sars-cov-2.cognanous.com
- Code: https://github.com/cognano/AVIDa-SARS-CoV-2
- Paper: https://arxiv.org/abs/2405.18749
## Citation
If you use AVIDa-SARS-CoV-2 in your research, please cite the following paper.
```bibtex
@inproceedings{tsuruta2024sars,
title={A {SARS}-{C}o{V}-2 Interaction Dataset and {VHH} Sequence Corpus for Antibody Language Models},
author={Hirofumi Tsuruta and Hiroyuki Yamazaki and Ryota Maeda and Ryotaro Tamura and Akihiro Imura},
booktitle={Advances in Neural Information Processing Systems 37},
year={2024}
}
```
提供机构:
COGNANO
原始信息汇总
数据集概述
数据集名称
- AVIDa-SARS-CoV-2
数据集描述
- AVIDa-SARS-CoV-2 是一个包含来自两只免疫于严重急性呼吸综合征冠状病毒2(SARS-CoV-2)刺突蛋白的羊驼的抗原-可变域重链抗体(VHH)相互作用的数据集。该数据集包含多样VHH序列与12种SARS-CoV-2变种(如Delta和Omicron)的结合或非结合的二元标签。
数据集内容
- 主要文件:
- AVIDa-SARS-CoV-2.csv / train.csv / test.csv
- VHH_sequence:VHH的氨基酸序列
- Ag_label:抗原类型
- label:二元标签,1表示结合,0表示非结合
- subject_species:VHH来源的物种
- subject_name:VHH来源的个体名称
- subject_sex:VHH来源的个体性别
- antigen_sequences.csv
- Ag_label:抗原类型
- Ag_sequence:抗原的氨基酸序列
- AVIDa-SARS-CoV-2.csv / train.csv / test.csv
数据集许可证
- 许可证:cc-by-nc-4.0
数据集标签
- 标签:biology, protein, antibody, VHH
引用信息
- 引用文献: bibtex @article{tsuruta2024sars, title={A {SARS}-{C}o{V}-2 Interaction Dataset and {VHH} Sequence Corpus for Antibody Language Models}, author={Hirofumi Tsuruta and Hiroyuki Yamazaki and Ryota Maeda and Ryotaro Tamura and Akihiro Imura}, journal={arXiv preprint arXiv:2405.18749}, year={2024} }



