wanglab/cafa5
收藏Hugging Face2025-11-07 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/wanglab/cafa5
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含三个部分:CAFA5推理数据、GO元数据和InterPro元数据。CAFA5推理数据部分提供了蛋白质的相关信息,包括蛋白质ID、名称、功能、长度、亚细胞位置、序列、GO ID、GO分类(生物过程、分子功能、细胞组分)、结构路径、String ID、相互作用伙伴以及完整的相互作用信息。GO元数据部分提供了GO术语的详细信息,包括ID、名称、原始定义、方面、深度、权重、清理后的定义、摘要及其相关长度和比例。InterPro元数据部分提供了蛋白质家族和结构域的信息,包括ID、条目名称和类型。数据集分为训练集、测试集和测试超集,每个部分都有相应的字节数和示例数量。
The dataset consists of three parts: CAFA5 reasoning data, GO metadata, and InterPro metadata. The CAFA5 reasoning part provides information about proteins, including protein ID, names, functions, lengths, subcellular locations, sequences, GO IDs, GO categories (biological process, molecular function, cellular component), structure paths, String IDs, interaction partners, and full interaction information. The GO metadata part provides detailed information about GO terms, including IDs, names, original definitions, aspects, depths, weights, cleaned definitions, summaries, and their related lengths and percentages. The InterPro metadata part provides information about protein families and domains, including IDs, entry names, and types. The dataset is split into training set, test set, and test superset, each with corresponding byte sizes and number of examples.
提供机构:
wanglab



