neuralninja110/dlgenai-nppe-dataset
收藏Hugging Face2025-12-17 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/neuralninja110/dlgenai-nppe-dataset
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含蛋白质序列及其对应的二级结构标签,适用于Q8(8类)和Q3(3类)分类任务。数据集分为训练集和测试集,训练集包含7262个序列,测试集包含1816个序列。每个序列具有唯一的标识符、氨基酸序列(20种标准氨基酸)以及8类和3类二级结构标签。8类标签(DSSP)包括Alpha螺旋、Beta链、Coil/Loop等,3类标签则将这些类别进一步简化为Helix、Strand和Coil。数据集可用于蛋白质二级结构预测的深度学习任务,评估指标为Q8和Q3预测的F1分数的调和平均数。
This dataset contains protein sequences with their corresponding secondary structure labels for both Q8 (8-class) and Q3 (3-class) classification tasks. The dataset is split into training and test sets, with 7262 sequences in the training set and 1816 sequences in the test set. Each sequence has a unique identifier, an amino acid sequence (20 standard amino acids), and both 8-class and 3-class secondary structure labels. The 8-class labels (DSSP) include Alpha helix, Beta strand, Coil/Loop, etc., while the 3-class labels simplify these into Helix, Strand, and Coil. The dataset can be used for deep learning tasks in protein secondary structure prediction, with the evaluation metric being the harmonic mean of F1 scores for Q8 and Q3 predictions.
提供机构:
neuralninja110



