five

alchemab/sarscov2-binding-prediction

收藏
Hugging Face2023-09-12 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/alchemab/sarscov2-binding-prediction
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: vh dtype: string - name: vl dtype: string - name: label dtype: class_label: names: '0': non-binder '1': binder splits: - name: train num_bytes: 10272366 num_examples: 41787 - name: eval num_bytes: 1141623 num_examples: 4644 - name: test num_bytes: 1268252 num_examples: 5159 download_size: 1194761 dataset_size: 12682241 --- # SARS-CoV-2 binding dataset Dataset of 104972 antibodies screened for binding the SARS-CoV-2 HR peptide, described in [Engelhart et al. (2022)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9606274/), were obtained from [Zenodo](https://zenodo.org/record/5095284). Average predicted logK<sub>D</sub> values were used for classifying sequences as binders and non-binders: * logK<sub>D</sub><3 = binders * logK<sub>D</sub>>=4 = non-binders * logK<sub>D</sub>>=3 and logK<sub>D</sub> = ambiguous; removed. Using these criteria, we have 51590 sequences remaining; these were stratified into an 80:10:10 ratio for training, test, validation, leading to: * 41787 sequences in training * 5159 sequences in validation * 4644 sequences in test Example | vh | vl | label | | ------- | ------- | ----- | | EVQ... | DIQ... | 1 | | EVQ... | DIQ... | 0 | | EVQ... | DIQ... | 0 | | EVQ... | DIQ... | 1 | References * [Engelhart et al. (2022) paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9606274/) * [Zenodo link for dataset](https://zenodo.org/record/5095284)
提供机构:
alchemab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作