five

hugging-science/jain-clinical-antibody-polyreactivity

收藏
Hugging Face2025-12-17 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/hugging-science/jain-clinical-antibody-polyreactivity
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含86个临床阶段抗体重链可变区(VH)序列及其二元多反应性标签,用于重现Sakhnini等人(2025年)的基准测试结果。原始数据由Jain等人(2017年)发布,包含137种FDA批准或晚期临床抗体的生物物理测量。数据集用于评估基于Boughter数据集训练的ESM-1v + Logistic Regression模型,实现了与Novo Nordisk公布结果的完全一致。关键特征包括:人类临床阶段抗体、抗体重链可变区序列、ELISA多反应性面板(6种配体)、二元分类标签(0=特异性,1=非特异性/多反应性)、86个抗体样本(57个特异性,29个非特异性)。数据集支持二元分类任务和Novo Nordisk基准测试,准确率为68.60%。

This dataset contains 86 clinical-stage antibody heavy chain variable domain (VH) sequences with binary polyreactivity labels, preprocessed to reproduce the benchmark results from Sakhnini et al. (2025) (Novo Nordisk & University of Cambridge). The original dataset was published by Jain et al. (2017) and contains biophysical measurements for 137 FDA-approved or late-stage clinical antibodies. This is a benchmark test set for evaluating the ESM-1v + Logistic Regression model trained on the Boughter dataset, achieving exact parity with Novos published results (68.60% accuracy). Key features include: human clinical-stage antibodies, antibody VH sequences, ELISA polyreactivity panel (6 ligands), binary classification labels (0 = specific, 1 = non-specific/polyreactive), and 86 antibodies (57 specific, 29 non-specific). The dataset supports binary classification tasks and the Novo Nordisk parity benchmark.
提供机构:
hugging-science
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作