five

hugging-science/harvey-nanobody-polyreactivity

收藏
Hugging Face2025-12-17 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/hugging-science/harvey-nanobody-polyreactivity
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含141,021个纳米抗体(VHH)序列,带有二元多反应性标签,根据Sakhnini等人(2025年)的方法进行预处理。数据集最初由Harvey等人(2022年)发布,包含通过FACS分选和深度测序评估的合成纳米抗体。此预处理版本用作评估在Boughter数据集上训练的ESM-1v + Logistic回归模型的测试集。数据集的关键特征包括合成骆驼科动物(纳米抗体)库、纳米抗体/单域抗体(VHH)类型、PSR(多特异性试剂)测定、FACS分选+深度测序方法、二元分类标签(0=低多反应性,1=高多反应性)、ANARCI注释(IMGT编号方案)、良好平衡(49.1%低,50.9%高多反应性)和大规模(141K序列)。数据集支持的任务包括纳米抗体多反应性的二元分类预测、常规抗体训练模型的跨域验证以及基准测试。

This dataset contains 141,021 nanobody (VHH) sequences with binary polyreactivity labels, preprocessed according to the methodology described in Sakhnini et al. 2025 (Novo Nordisk & University of Cambridge). The dataset was originally published by Harvey et al. 2022 and contains synthetic nanobodies assessed by PSR (Poly-Specificity Reagent) assay via FACS sorting and deep sequencing. This is the preprocessed version used as a test set for evaluating the ESM-1v + Logistic Regression model trained on the Boughter dataset. Key features include: Organism - Synthetic camelid (nanobody) library (yeast display); Molecule Type - Nanobody / Single-domain antibody (VHH); Assay - PSR (Poly-Specificity Reagent) from Sf9 insect cell membranes; Method - FACS sorting + Deep sequencing; Labels - Binary classification (0 = low polyreactivity, 1 = high polyreactivity); Annotation - ANARCI with IMGT numbering scheme; Balance - Well-balanced (49.1% low, 50.9% high polyreactivity); Scale - Large-scale dataset (141K sequences). Supported tasks include: Binary Classification - Predicting nanobody polyreactivity from sequence; Cross-Domain Validation - Testing conventional antibody-trained models on nanobodies; Benchmark - Sakhnini et al. 2025 Fig. S14E (61.7% accuracy).
提供机构:
hugging-science
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作