monsoon-nlp/greenbeing-binary

Name: monsoon-nlp/greenbeing-binary
Creator: monsoon-nlp
Published: 2024-07-16 08:13:23
License: 暂无描述

Hugging Face2024-07-16 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/monsoon-nlp/greenbeing-binary

下载链接

链接失效反馈

官方服务：

资源简介：

GreenBeing Binary数据集是一个二分类版本的数据集，用于微调和评估。该数据集来源于UniProtKB中的蛋白质序列，涉及特定的粮食作物和相关物种。氨基酸序列使用IUPAC-IUB代码表示。数据集的任务是二分类，query字段包含18个可能的标签，label字段是二分类标签（0 = No, 1 = Yes）。数据集分为微调和评估两个部分，分别包含来自不同分类群的蛋白质序列。微调部分包含来自特定分类群的已审查蛋白质（Swiss-Prot），每行包含基因名称、物种或亚种以及氨基酸序列。评估部分包含来自其他属（如鳄梨、胡萝卜、木薯、荔枝、李属等）的已审查蛋白质（Swiss-Prot），每行同样包含基因名称、物种或亚种以及氨基酸序列。

The GreenBeing Binary dataset is a binary classification version of the finetuning/evaluation datasets. It is derived from protein sequences in UniProtKB, involving select food crops and related species. Amino acid sequences are represented using IUPAC-IUB codes. The task of the dataset is binary classification, with the query field containing 18 possible labels and the label field being the binary class (0 = No, 1 = Yes). The dataset is divided into finetuning and evaluation splits, each containing protein sequences from different taxa. The finetuning split includes reviewed proteins (Swiss-Prot) from specific taxa, with each row containing a gene name, species or subspecies, and an amino acid sequence. The evaluation split includes reviewed proteins (Swiss-Prot) from other genera (e.g., avocado, carrot, cassava, lychee, prunus), with each row also containing a gene name, species or subspecies, and an amino acid sequence.

提供机构：

monsoon-nlp

原始信息汇总

数据集概述

基本信息

许可: MIT
任务类别: 文本分类
标签: 生物学
大小类别: 1K<n<10K
美观名称: GreenBeing Binary

配置

配置名称: binaryclass
数据文件:
- finetuning: proteins_finetuning.csv
- evaluation: proteins_evaluation.csv

数据内容

来源: 来自UniProtKB知识库的选定食品作物及相关物种的蛋白质。
序列表示: 使用IUPAC-IUB代码，字母A-Z映射到氨基酸。

任务细节

分类标签: "label"字段为二元类（0 = 否，1 = 是），"query"字段包含18个可能的标签。
子细胞位置: 包括细胞膜、内质网、质体等18种位置。

数据集划分

Finetuning split: 来自上述分类的评审蛋白质（Swiss-Prot），每行包含基因名称、物种或亚种及氨基酸序列。
Evaluation split: 来自其他属（如鳄梨、胡萝卜、木薯、荔枝、李属）的评审蛋白质（Swiss-Prot），每行包含基因名称、物种或亚种及氨基酸序列。

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集