GenerTeam/gener-tasks
收藏Hugging Face2025-02-13 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/GenerTeam/gener-tasks
下载链接
链接失效反馈官方服务:
资源简介:
Gener Tasks数据集包含两个子任务:基因分类和分类单元分类。基因分类任务评估模型对100至5000个碱基对的短至中等长度序列的理解能力,包括六种不同的基因类型以及来自非基因区域的控制样本,从RefSeq的六个不同真核生物分类群中进行平衡抽样。分类目标是预测基因类型。分类单元分类任务旨在评估模型对更长序列的理解,这些序列包括基因区域以及主要非基因区域,长度在10,000至100,000个碱基对之间。样本同样平衡,来源于RefSeq的六个分类群,目标是预测每个样本的分类单元。
The Gener Tasks dataset includes two subtasks: gene classification and taxonomic classification. The gene classification task assesses the models ability to understand short to medium-length sequences ranging from 100 to 5000 bp, including six different gene types and control samples from non-gene regions, with balanced sampling from six distinct eukaryotic taxonomic groups in RefSeq. The classification goal is to predict the gene type. The taxonomic classification task is designed to evaluate the models comprehension of longer sequences, which include both gene and predominantly non-gene regions, ranging in length from 10,000 to 100,000 bp. Samples are similarly balanced and sourced from RefSeq across the same six taxonomic groups, with the objective being to predict the taxonomic group of each sample.
提供机构:
GenerTeam



