five

hugging-science/boughter-antibody-polyreactivity

收藏
Hugging Face2025-12-17 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/hugging-science/boughter-antibody-polyreactivity
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含914个抗体重链可变区(VH)序列,带有二元多反应性标签,预处理方法遵循Sakhnini等人2025年(诺和诺德和剑桥大学)描述的方法。数据集最初由Boughter等人2020年发布,包含小鼠抗体,通过ELISA方法测量对4-7种抗原的多反应性(常见的抗原有DNA、胰岛素、LPS、鞭毛蛋白、白蛋白、心磷脂、KLH)。这是用于训练ESM-1v + Logistic Regression模型预测抗体非特异性的预处理版本。关键特征包括:生物体为小鼠(Mus musculus),分子类型为抗体重链可变区(VH),检测方法为ELISA多反应性面板(4-7种抗原),标签为二元分类(0=特异性,1=非特异性/多反应性),注释使用ANARCI和IMGT编号方案,数据集平衡良好(48.5%特异性,51.5%非特异性)。数据集支持二元分类任务,用于从序列预测抗体多反应性/非特异性,基准测试为诺和诺德对等基准(71%的10倍交叉验证准确率)。数据集的语言为蛋白质序列(氨基酸字母表)。

This dataset contains 914 antibody heavy chain variable domain (VH) sequences with binary polyreactivity labels, preprocessed according to the methodology described in Sakhnini et al. 2025 (Novo Nordisk & University of Cambridge). The dataset was originally published by Boughter et al. 2020 and contains mouse antibodies with ELISA-based polyreactivity measurements against a panel of 4–7 antigens (commonly described as: DNA, insulin, LPS, flagellin, albumin, cardiolipin, KLH). This is the preprocessed version used for training the ESM-1v + Logistic Regression model that predicts antibody non-specificity. Key features include: Organism is Mouse (Mus musculus), Molecule Type is Antibody heavy chain variable domain (VH), Assay is ELISA polyreactivity panel (4–7 antigens), Labels are Binary classification (0 = specific, 1 = non-specific/polyreactive), Annotation is ANARCI with IMGT numbering scheme, Balance is well-balanced (48.5% specific, 51.5% non-specific). The dataset supports Binary Classification task for predicting antibody polyreactivity/non-specificity from sequence, Benchmark is Novo Nordisk parity benchmark (71% 10-fold CV accuracy). Languages are Protein sequences (amino acid alphabet).
提供机构:
hugging-science
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作