maize-genetics/plexbench-base
收藏Hugging Face2023-11-13 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/maize-genetics/plexbench-base
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含玉米和拟南芥的基因表达数据,特别是叶和根组织的表达值,用于基准测试序列到基因表达预测的机器学习模型。数据集结构包括基因组文件夹和任务文件夹,其中基因组文件夹包含注释和GFF文件,任务文件夹按物种-任务-组织分类,并分为训练、验证和测试集,比例为80%、10%和10%。数据集的来源包括多个实验样本,涵盖了不同的物种、基因型、组织、年龄和条件。数据集的收集和处理方法包括最大基因表达数据集、绝对表达数据集和开关表达数据集的创建,以及使用正交组引导的分割方法进行训练-测试-验证分割。
提供机构:
maize-genetics
原始信息汇总
数据集卡片:Maize和Arabidopsis基因表达
数据集描述
包含的物种有玉米(Maize)和拟南芥(Arabidopsis thaliana)。数据集包括叶和根组织的基因表达值。在tasks文件夹中,数据集按物种-任务-组织进行细分。genomes文件夹中的基因组包括与特定基因组相关的注释和GFF文件。所有任务按80%训练、10%验证和10%测试进行划分。
数据集结构
dataset genomes/ Arabidopsis_thaliana/ annotation.fa ath.gff Zea_mays/ annotation.fa ath.gff tasks/ species-task-tissue/ train.tsv validate.tsv test.tsv
数据集来源
| sample_name | species | genotype | library_layout | library_selection | reads_location | organ | age | condition | replicate | batch | reference |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SRR505743 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | root | seedling | controlled | 1 | 1 | SRP013631 |
| SRR505744 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | leaf | seedling | controlled | 1 | 1 | SRP013631 |
| SRR953400 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | leaf | seedling | controlled | 1 | 1 | PRJNA215448 |
| SRR1005386 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | leaf | seedling | controlled | 1 | 1 | PRJNA222364 |
| SRR578947 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | root | seedling | controlled | 1 | 1 | SRP013631 |
| SRR578948 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | root | seedling | controlled | 1 | 1 | SRP013631 |
| ERR2096663 | Zea_mays | B73 | paired-end | polyA | sra | leaf | seedling | controlled | 1 | 1 | PRJEB22166 |
| ERR2096664 | Zea_mays | B73 | paired-end | polyA | sra | leaf | seedling | controlled | 1 | 1 | PRJEB22166 |
| ERR2096665 | Zea_mays | B73 | paired-end | polyA | sra | leaf | seedling | controlled | 1 | 1 | PRJEB22166 |
| ERR2096666 | Zea_mays | B73 | paired-end | polyA | sra | leaf | seedling | controlled | 1 | 1 | PRJEB22166 |
| ERR2096667 | Zea_mays | B73 | paired-end | polyA | sra | leaf | seedling | controlled | 1 | 1 | PRJEB22166 |
| ERR3773807 | Zea_mays | B73 | paired-end | polyA | sra | root | seedling | controlled | 1 | 1 | PRJEB35943 |
| ERR3773808 | Zea_mays | B73 | paired-end | polyA | sra | root | seedling | controlled | 1 | 1 | PRJEB35943 |
| ERR986091 | Zea_mays | B73 | paired-end | random | sra | root | seedling | controlled | 1 | 1 | PRJEB10406 |
数据集来源
| sample_name | species | genotype | library_layout | library_selection | reads_location | organ | age | condition | replicate | batch | reference |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SRR505743 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | root | seedling | controlled | 1 | 1 | SRP013631 |
| SRR505744 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | leaf | seedling | controlled | 1 | 1 | SRP013631 |
| SRR953400 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | leaf | seedling | controlled | 1 | 1 | PRJNA215448 |
| SRR1005386 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | leaf | seedling | controlled | 1 | 1 | PRJNA222364 |
| SRR578947 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | root | seedling | controlled | 1 | 1 | SRP013631 |
| SRR578948 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | root | seedling | controlled | 1 | 1 | SRP013631 |
| ERR2096663 | Zea_mays | B73 | paired-end | polyA | sra | leaf | seedling | controlled | 1 | 1 | PRJEB22166 |
| ERR2096664 | Zea_mays | B73 | paired-end | polyA | sra | leaf | seedling | controlled | 1 | 1 | PRJEB22166 |
| ERR2096665 | Zea_mays | B73 | paired-end | polyA | sra | leaf | seedling | controlled | 1 | 1 | PRJEB22166 |
| ERR2096666 | Zea_mays | B73 | paired-end | polyA | sra | leaf | seedling | controlled | 1 | 1 | PRJEB22166 |
| ERR2096667 | Zea_mays | B73 | paired-end | polyA | sra | leaf | seedling | controlled | 1 | 1 | PRJEB22166 |
| ERR3773807 | Zea_mays | B73 | paired-end | polyA | sra | root | seedling | controlled | 1 | 1 | PRJEB35943 |
| ERR3773808 | Zea_mays | B73 | paired-end | polyA | sra | root | seedling | controlled | 1 | 1 | PRJEB35943 |
| ERR986091 | Zea_mays | B73 | paired-end | random | sra | root | seedling | controlled | 1 | 1 | PRJEB10406 |
数据集来源
| sample_name | species | genotype | library_layout | library_selection | reads_location | organ | age | condition | replicate | batch | reference |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SRR505743 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | root | seedling | controlled | 1 | 1 | SRP013631 |
| SRR505744 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | leaf | seedling | controlled | 1 | 1 | SRP013631 |
| SRR953400 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | leaf | seedling | controlled | 1 | 1 | PRJNA215448 |
| SRR1005386 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | leaf | seedling | controlled | 1 | 1 | PRJNA222364 |
| SRR578947 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | root | seedling | controlled | 1 | 1 | SRP013631 |
| SRR578948 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | root | seedling | controlled | 1 | 1 | SRP013631 |
| ERR2096663 | Zea_mays | B73 | paired-end | polyA | sra | leaf | seedling | controlled | 1 | 1 | PRJEB22166 |
| ERR2096664 | Zea_mays | B73 | paired-end | polyA | sra | leaf | seedling | controlled | 1 | 1 | PRJEB22166 |
| ERR2096665 | Zea_mays | B73 | paired-end | polyA | sra | leaf | seedling | controlled | 1 | 1 | PRJEB22166 |
| ERR2096666 | Zea_mays | B73 | paired-end | polyA | sra | leaf | seedling | controlled | 1 | 1 | PRJEB22166 |
| ERR2096667 | Zea_mays | B73 | paired-end | polyA | sra | leaf | seedling | controlled | 1 | 1 | PRJEB22166 |
| ERR3773807 | Zea_mays | B73 | paired-end | polyA | sra | root | seedling | controlled | 1 | 1 | PRJEB35943 |
| ERR3773808 | Zea_mays | B73 | paired-end | polyA | sra | root | seedling | controlled | 1 | 1 | PRJEB35943 |
| ERR986091 | Zea_mays | B73 | paired-end | random | sra | root | seedling | controlled | 1 | 1 | PRJEB10406 |
数据集来源
| sample_name | species | genotype | library_layout | library_selection | reads_location | organ | age | condition | replicate | batch | reference |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SRR505743 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | root | seedling | controlled | 1 | 1 | SRP013631 |
| SRR505744 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | leaf | seedling | controlled | 1 | 1 | SRP013631 |
| SRR953400 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | leaf | seedling | controlled | 1 | 1 | PRJNA215448 |
| SRR1005386 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | leaf | seedling | controlled | 1 | 1 | PRJNA222364 |
| SRR578947 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | root | seedling | controlled | 1 | 1 | SRP013631 |
| SRR578948 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | root | seedling | controlled | 1 | 1 | SRP013631 |
| ERR2096663 | Zea_mays | B73 | paired-end | polyA | sra | leaf | seedling | controlled | 1 | 1 | PRJEB22166 |
| ERR2096664 | Zea_mays | B73 | paired-end | polyA | sra | leaf | seedling | controlled | 1 | 1 | PRJEB22166 |
| ERR2096665 | Zea_mays | B73 | paired-end | polyA | sra | leaf | seedling | controlled | 1 | 1 | PRJEB22166 |
| ERR2096666 | Zea_mays | B73 | paired-end | polyA | sra | leaf | seedling | controlled | 1 | 1 | PRJEB22166 |
| ERR2096667 | Zea_mays | B73 | paired-end | polyA | sra | leaf | seedling | controlled | 1 | 1 | PRJEB22166 |
| ERR3773807 | Zea_mays | B73 | paired-end | polyA | sra | root | seedling | controlled | 1 | 1 | PRJEB35943 |
| ERR3773808 | Zea_mays | B73 | paired-end | polyA | sra | root | seedling | controlled | 1 | 1 | PRJEB35943 |
| ERR986091 | Zea_mays | B73 | paired-end | random | sra | root | seedling | controlled | 1 | 1 | PRJEB10406 |
数据集来源
| sample_name | species | genotype | library_layout | library_selection | reads_location | organ | age | condition | replicate | batch | reference |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SRR505743 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | root | seedling | controlled | 1 | 1 | SRP013631 |
| SRR505744 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | leaf | seedling | controlled | 1 | 1 | SRP013631 |
| SRR953400 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | leaf | seedling | controlled | 1 | 1 | PRJNA215448 |
| SRR1005386 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | leaf | seedling | controlled | 1 | 1 | PRJNA222364 |
| SRR578947 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | root | seedling | controlled | 1 | 1 | SRP013631 |
| SRR578948 | Arabidopsis_thaliana | Col-0 | single-read | random | sra | root | seedling | controlled | 1 | 1 | SRP013631 |
| ERR2096663 | Zea_mays | B73 | paired-end | polyA | sra | leaf | seedling | controlled | 1 | 1 | PRJEB22166 |
| ERR2096664 | Zea_mays | B73 | paired-end | polyA | sra | leaf | seedling | controlled | 1 | 1 | PRJEB22166 |
| ERR2096665 | Zea_mays | B73 | paired-end | polyA | sra | leaf | seedling | controlled | 1 | 1 | PRJEB22166 |
| ERR2096666 | Zea |
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是一个用于基准测试序列到基因表达预测机器学习模型的植物基因表达数据集,主要包含玉米和拟南芥两种物种的叶和根组织基因表达值。数据集按物种-任务-组织划分,提供基因组注释文件和任务相关的TSV文件,并采用80%训练、10%验证、10%测试的标准划分,适用于机器学习模型训练与评估。
以上内容由遇见数据集搜集并总结生成



