five

bloyal/oas_paired_human_sars_cov_2

收藏
Hugging Face2023-08-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/bloyal/oas_paired_human_sars_cov_2
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 size_categories: - 100K<n<1M --- # Paired SARS-COV-2 heavy/light chain sequences from the Observed Antibody Space database Human paired heavy/light chain amino acid sequences from the Observed Antibody Space (OAS) database obtained from SARS-COV-2 studies. https://opig.stats.ox.ac.uk/webapps/oas/ Please include the following citation in your work: ``` Olsen, TH, Boyles, F, Deane, CM. Observed Antibody Space: A diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences. Protein Science. 2022; 31: 141–146. https://doi.org/10.1002/pro.4205 ``` ## Data Preparation This data was obtained on August 3, 2023 by searching the OAS Paired Sequence database with the following criteria: - Species = "human" - Disease = "SARS-COV-2" This returned 704,652 filtered sequences from 3 studies split across 63 .csv.gz data unit files. These were extracted and filtered for records where both the `complete_vdj_heavy` and `complete_vdj_light` values were "T". Finally, the `sequence_alignment_aa_heavy` and `sequence_alignment_aa_light` fields were extracted into dataset and a 90/10 train/test applied. The resulting data was saved in pyarrow format.
提供机构:
bloyal
原始信息汇总

数据集概述

数据集名称

Paired SARS-COV-2 heavy/light chain sequences from the Observed Antibody Space database

数据集描述

本数据集包含从Observed Antibody Space (OAS)数据库中获取的人类配对重链/轻链氨基酸序列,这些序列来源于SARS-COV-2研究。

数据集规模

  • 数据集大小:100K<n<1M

数据集准备

数据集于2023年8月3日通过以下条件从OAS配对序列数据库中筛选得到:

  • 物种:人类
  • 疾病:SARS-COV-2

筛选后得到704,652条序列,分布在63个.csv.gz数据单元文件中。经过进一步过滤,提取了sequence_alignment_aa_heavysequence_alignment_aa_light字段,并应用了90/10的训练/测试分割。最终数据以pyarrow格式保存。

数据集许可证

  • 许可证:CC-BY-4.0
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作