hugging-science/boughter-antibody-polyreactivity

Name: hugging-science/boughter-antibody-polyreactivity
Creator: hugging-science
Published: 2025-12-17 05:02:17
License: 暂无描述

Hugging Face2025-12-17 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/hugging-science/boughter-antibody-polyreactivity

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含914个抗体重链可变区（VH）序列，带有二元多反应性标签，预处理方法遵循Sakhnini等人2025年（诺和诺德和剑桥大学）描述的方法。数据集最初由Boughter等人2020年发布，包含小鼠抗体，通过ELISA方法测量对4-7种抗原的多反应性（常见的抗原有DNA、胰岛素、LPS、鞭毛蛋白、白蛋白、心磷脂、KLH）。这是用于训练ESM-1v + Logistic Regression模型预测抗体非特异性的预处理版本。关键特征包括：生物体为小鼠（Mus musculus），分子类型为抗体重链可变区（VH），检测方法为ELISA多反应性面板（4-7种抗原），标签为二元分类（0=特异性，1=非特异性/多反应性），注释使用ANARCI和IMGT编号方案，数据集平衡良好（48.5%特异性，51.5%非特异性）。数据集支持二元分类任务，用于从序列预测抗体多反应性/非特异性，基准测试为诺和诺德对等基准（71%的10倍交叉验证准确率）。数据集的语言为蛋白质序列（氨基酸字母表）。

This dataset contains 914 antibody heavy chain variable domain (VH) sequences with binary polyreactivity labels, preprocessed according to the methodology described in Sakhnini et al. 2025 (Novo Nordisk & University of Cambridge). The dataset was originally published by Boughter et al. 2020 and contains mouse antibodies with ELISA-based polyreactivity measurements against a panel of 4–7 antigens (commonly described as: DNA, insulin, LPS, flagellin, albumin, cardiolipin, KLH). This is the preprocessed version used for training the ESM-1v + Logistic Regression model that predicts antibody non-specificity. Key features include: Organism is Mouse (Mus musculus), Molecule Type is Antibody heavy chain variable domain (VH), Assay is ELISA polyreactivity panel (4–7 antigens), Labels are Binary classification (0 = specific, 1 = non-specific/polyreactive), Annotation is ANARCI with IMGT numbering scheme, Balance is well-balanced (48.5% specific, 51.5% non-specific). The dataset supports Binary Classification task for predicting antibody polyreactivity/non-specificity from sequence, Benchmark is Novo Nordisk parity benchmark (71% 10-fold CV accuracy). Languages are Protein sequences (amino acid alphabet).

提供机构：

hugging-science

5,000+

优质数据集

54 个

任务类型

进入经典数据集