bloyal/oas_paired_human_sars_cov_2
收藏Hugging Face2023-08-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/bloyal/oas_paired_human_sars_cov_2
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
size_categories:
- 100K<n<1M
---
# Paired SARS-COV-2 heavy/light chain sequences from the Observed Antibody Space database
Human paired heavy/light chain amino acid sequences from the Observed Antibody Space (OAS) database obtained from SARS-COV-2 studies.
https://opig.stats.ox.ac.uk/webapps/oas/
Please include the following citation in your work:
```
Olsen, TH, Boyles, F, Deane, CM. Observed Antibody Space: A diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences. Protein Science. 2022; 31: 141–146. https://doi.org/10.1002/pro.4205
```
## Data Preparation
This data was obtained on August 3, 2023 by searching the OAS Paired Sequence database with the following criteria:
- Species = "human"
- Disease = "SARS-COV-2"
This returned 704,652 filtered sequences from 3 studies split across 63 .csv.gz data unit files. These were extracted and filtered for records where both the `complete_vdj_heavy` and `complete_vdj_light` values were "T". Finally, the `sequence_alignment_aa_heavy` and `sequence_alignment_aa_light` fields were extracted into dataset and a 90/10 train/test applied. The resulting data was saved in pyarrow format.
提供机构:
bloyal
原始信息汇总
数据集概述
数据集名称
Paired SARS-COV-2 heavy/light chain sequences from the Observed Antibody Space database
数据集描述
本数据集包含从Observed Antibody Space (OAS)数据库中获取的人类配对重链/轻链氨基酸序列,这些序列来源于SARS-COV-2研究。
数据集规模
- 数据集大小:100K<n<1M
数据集准备
数据集于2023年8月3日通过以下条件从OAS配对序列数据库中筛选得到:
- 物种:人类
- 疾病:SARS-COV-2
筛选后得到704,652条序列,分布在63个.csv.gz数据单元文件中。经过进一步过滤,提取了sequence_alignment_aa_heavy和sequence_alignment_aa_light字段,并应用了90/10的训练/测试分割。最终数据以pyarrow格式保存。
数据集许可证
- 许可证:CC-BY-4.0



