ABCpred: Prediction of Continuous B-Cell Epitopes in an Antigen Using Recurrent Neural Network
收藏DataCite Commons2026-05-06 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20047945
下载链接
链接失效反馈官方服务:
资源简介:
Dataset for Continuous B-Cell Epitope Prediction
Overview
This repository contains the curated dataset used in our study on the prediction of continuous B-cell epitopes using machine learning approaches, specifically recurrent neural networks (RNNs).
The dataset was originally developed for the identification and prediction of linear (continuous) B-cell epitopes in antigenic protein sequences. This resource may be useful for researchers working in:
Immunoinformatics
Vaccine design
Antibody epitope prediction
Computational immunology
Machine learning-based peptide classification
Reference
Saha S, Raghava GP. Prediction of continuous B-cell epitopes in an antigen using recurrent neural network. Proteins. 2006 Oct 1;65(1):40-8. doi: 10.1002/prot.21078. PMID: 16894596. href="https://onlinelibrary.wiley.com/doi/10.1002/prot.21078"
Dataset Description
The dataset consists of experimentally validated B-cell epitopes and negative peptide samples.
Positive Dataset
Contains 700 non-redundant experimentally validated continuous B-cell epitopes
Collected from the Bcipep database
Only epitopes of length ≤ 20 amino acids were considered
Redundant sequences were removed to reduce bias
Negative Dataset
Contains 700 non-epitope peptide sequences
Randomly generated from Swiss-Prot proteins
Any sequence identical to known epitopes was removed
Thus, the final benchmark dataset contains:
Dataset Type
Number of Sequences
B-cell Epitopes
700
Non-Epitopes
700
Total
1400
Data Processing
To create fixed-length patterns suitable for neural network training:
Variable-length epitopes were normalized to fixed window lengths
Neighboring residues from the parent antigen sequence were added when needed
Multiple window sizes (10, 12, 14, 16, 18, and 20 residues) were evaluated
The best performance was achieved with:
Window Length: 16 residues
Model: Recurrent Neural Network (Jordan Network)
Applications
This dataset can be used for:
Training machine learning/deep learning models
Benchmarking epitope prediction tools
Feature engineering on peptide sequences
Comparative studies with modern protein language models
提供机构:
Zenodo
创建时间:
2026-05-06



