five

COGNANO/AVIDa-SARS-CoV-2

收藏
Hugging Face2024-10-17 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/COGNANO/AVIDa-SARS-CoV-2
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 tags: - biology - protein - antibody - VHH --- ## AVIDa-SARS-CoV-2 AVIDa-SARS-CoV-2 is a dataset featuring the antigen-variable domain of heavy chain of heavy chain antibody (VHH) interactions obtained from two alpacas immunized with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike proteins. AVIDa-SARS-CoV-2 includes binary labels indicating the binding or non-binding of diverse VHH sequences to 12 SARS-CoV-2 mutants, such as the Delta and Omicron variants. Further details on AVIDa-SARS-CoV-2 are described in our paper "[A SARS-CoV-2 Interaction Dataset and VHH Sequence Corpus for Antibody Language Models](https://arxiv.org/abs/2405.18749).” ## Columns **AVIDa-SARS-CoV-2.csv / train.csv / test.csv** | Column | Description | | --------------- | --------------------------------------------------------------------------------- | | VHH_sequence | Amino acid sequence of VHH | | Ag_label | Antigen Type | | label | Binary label represented by 1 for the binding pair and 0 for the non-binding pair | | subject_species | Species of the subject from which VHH was collected | | subject_name | Name of the subject from which VHH was collected | | subject_sex | Sex of the subject from which VHH was collected | **antigen_sequences.csv** | Column | Description | | ----------- | ------------------------------ | | Ag_label | Antigen Type | | Ag_sequence | Amino acid sequence of antigen | ## Links - Project Page: https://avida-sars-cov-2.cognanous.com - Code: https://github.com/cognano/AVIDa-SARS-CoV-2 - Paper: https://arxiv.org/abs/2405.18749 ## Citation If you use AVIDa-SARS-CoV-2 in your research, please cite the following paper. ```bibtex @inproceedings{tsuruta2024sars, title={A {SARS}-{C}o{V}-2 Interaction Dataset and {VHH} Sequence Corpus for Antibody Language Models}, author={Hirofumi Tsuruta and Hiroyuki Yamazaki and Ryota Maeda and Ryotaro Tamura and Akihiro Imura}, booktitle={Advances in Neural Information Processing Systems 37}, year={2024} } ```
提供机构:
COGNANO
原始信息汇总

数据集概述

数据集名称

  • AVIDa-SARS-CoV-2

数据集描述

  • AVIDa-SARS-CoV-2 是一个包含来自两只免疫于严重急性呼吸综合征冠状病毒2(SARS-CoV-2)刺突蛋白的羊驼的抗原-可变域重链抗体(VHH)相互作用的数据集。该数据集包含多样VHH序列与12种SARS-CoV-2变种(如Delta和Omicron)的结合或非结合的二元标签。

数据集内容

  • 主要文件
    • AVIDa-SARS-CoV-2.csv / train.csv / test.csv
      • VHH_sequence:VHH的氨基酸序列
      • Ag_label:抗原类型
      • label:二元标签,1表示结合,0表示非结合
      • subject_species:VHH来源的物种
      • subject_name:VHH来源的个体名称
      • subject_sex:VHH来源的个体性别
    • antigen_sequences.csv
      • Ag_label:抗原类型
      • Ag_sequence:抗原的氨基酸序列

数据集许可证

  • 许可证:cc-by-nc-4.0

数据集标签

  • 标签:biology, protein, antibody, VHH

引用信息

  • 引用文献: bibtex @article{tsuruta2024sars, title={A {SARS}-{C}o{V}-2 Interaction Dataset and {VHH} Sequence Corpus for Antibody Language Models}, author={Hirofumi Tsuruta and Hiroyuki Yamazaki and Ryota Maeda and Ryotaro Tamura and Akihiro Imura}, journal={arXiv preprint arXiv:2405.18749}, year={2024} }
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作