five

maize-genetics/plexbench-base

收藏
Hugging Face2023-11-13 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/maize-genetics/plexbench-base
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含玉米和拟南芥的基因表达数据,特别是叶和根组织的表达值,用于基准测试序列到基因表达预测的机器学习模型。数据集结构包括基因组文件夹和任务文件夹,其中基因组文件夹包含注释和GFF文件,任务文件夹按物种-任务-组织分类,并分为训练、验证和测试集,比例为80%、10%和10%。数据集的来源包括多个实验样本,涵盖了不同的物种、基因型、组织、年龄和条件。数据集的收集和处理方法包括最大基因表达数据集、绝对表达数据集和开关表达数据集的创建,以及使用正交组引导的分割方法进行训练-测试-验证分割。
提供机构:
maize-genetics
原始信息汇总

数据集卡片:Maize和Arabidopsis基因表达

数据集描述

包含的物种有玉米(Maize)和拟南芥(Arabidopsis thaliana)。数据集包括叶和根组织的基因表达值。在tasks文件夹中,数据集按物种-任务-组织进行细分。genomes文件夹中的基因组包括与特定基因组相关的注释和GFF文件。所有任务按80%训练、10%验证和10%测试进行划分。

数据集结构

dataset genomes/ Arabidopsis_thaliana/ annotation.fa ath.gff Zea_mays/ annotation.fa ath.gff tasks/ species-task-tissue/ train.tsv validate.tsv test.tsv

数据集来源

sample_name species genotype library_layout library_selection reads_location organ age condition replicate batch reference
SRR505743 Arabidopsis_thaliana Col-0 single-read random sra root seedling controlled 1 1 SRP013631
SRR505744 Arabidopsis_thaliana Col-0 single-read random sra leaf seedling controlled 1 1 SRP013631
SRR953400 Arabidopsis_thaliana Col-0 single-read random sra leaf seedling controlled 1 1 PRJNA215448
SRR1005386 Arabidopsis_thaliana Col-0 single-read random sra leaf seedling controlled 1 1 PRJNA222364
SRR578947 Arabidopsis_thaliana Col-0 single-read random sra root seedling controlled 1 1 SRP013631
SRR578948 Arabidopsis_thaliana Col-0 single-read random sra root seedling controlled 1 1 SRP013631
ERR2096663 Zea_mays B73 paired-end polyA sra leaf seedling controlled 1 1 PRJEB22166
ERR2096664 Zea_mays B73 paired-end polyA sra leaf seedling controlled 1 1 PRJEB22166
ERR2096665 Zea_mays B73 paired-end polyA sra leaf seedling controlled 1 1 PRJEB22166
ERR2096666 Zea_mays B73 paired-end polyA sra leaf seedling controlled 1 1 PRJEB22166
ERR2096667 Zea_mays B73 paired-end polyA sra leaf seedling controlled 1 1 PRJEB22166
ERR3773807 Zea_mays B73 paired-end polyA sra root seedling controlled 1 1 PRJEB35943
ERR3773808 Zea_mays B73 paired-end polyA sra root seedling controlled 1 1 PRJEB35943
ERR986091 Zea_mays B73 paired-end random sra root seedling controlled 1 1 PRJEB10406

数据集来源

sample_name species genotype library_layout library_selection reads_location organ age condition replicate batch reference
SRR505743 Arabidopsis_thaliana Col-0 single-read random sra root seedling controlled 1 1 SRP013631
SRR505744 Arabidopsis_thaliana Col-0 single-read random sra leaf seedling controlled 1 1 SRP013631
SRR953400 Arabidopsis_thaliana Col-0 single-read random sra leaf seedling controlled 1 1 PRJNA215448
SRR1005386 Arabidopsis_thaliana Col-0 single-read random sra leaf seedling controlled 1 1 PRJNA222364
SRR578947 Arabidopsis_thaliana Col-0 single-read random sra root seedling controlled 1 1 SRP013631
SRR578948 Arabidopsis_thaliana Col-0 single-read random sra root seedling controlled 1 1 SRP013631
ERR2096663 Zea_mays B73 paired-end polyA sra leaf seedling controlled 1 1 PRJEB22166
ERR2096664 Zea_mays B73 paired-end polyA sra leaf seedling controlled 1 1 PRJEB22166
ERR2096665 Zea_mays B73 paired-end polyA sra leaf seedling controlled 1 1 PRJEB22166
ERR2096666 Zea_mays B73 paired-end polyA sra leaf seedling controlled 1 1 PRJEB22166
ERR2096667 Zea_mays B73 paired-end polyA sra leaf seedling controlled 1 1 PRJEB22166
ERR3773807 Zea_mays B73 paired-end polyA sra root seedling controlled 1 1 PRJEB35943
ERR3773808 Zea_mays B73 paired-end polyA sra root seedling controlled 1 1 PRJEB35943
ERR986091 Zea_mays B73 paired-end random sra root seedling controlled 1 1 PRJEB10406

数据集来源

sample_name species genotype library_layout library_selection reads_location organ age condition replicate batch reference
SRR505743 Arabidopsis_thaliana Col-0 single-read random sra root seedling controlled 1 1 SRP013631
SRR505744 Arabidopsis_thaliana Col-0 single-read random sra leaf seedling controlled 1 1 SRP013631
SRR953400 Arabidopsis_thaliana Col-0 single-read random sra leaf seedling controlled 1 1 PRJNA215448
SRR1005386 Arabidopsis_thaliana Col-0 single-read random sra leaf seedling controlled 1 1 PRJNA222364
SRR578947 Arabidopsis_thaliana Col-0 single-read random sra root seedling controlled 1 1 SRP013631
SRR578948 Arabidopsis_thaliana Col-0 single-read random sra root seedling controlled 1 1 SRP013631
ERR2096663 Zea_mays B73 paired-end polyA sra leaf seedling controlled 1 1 PRJEB22166
ERR2096664 Zea_mays B73 paired-end polyA sra leaf seedling controlled 1 1 PRJEB22166
ERR2096665 Zea_mays B73 paired-end polyA sra leaf seedling controlled 1 1 PRJEB22166
ERR2096666 Zea_mays B73 paired-end polyA sra leaf seedling controlled 1 1 PRJEB22166
ERR2096667 Zea_mays B73 paired-end polyA sra leaf seedling controlled 1 1 PRJEB22166
ERR3773807 Zea_mays B73 paired-end polyA sra root seedling controlled 1 1 PRJEB35943
ERR3773808 Zea_mays B73 paired-end polyA sra root seedling controlled 1 1 PRJEB35943
ERR986091 Zea_mays B73 paired-end random sra root seedling controlled 1 1 PRJEB10406

数据集来源

sample_name species genotype library_layout library_selection reads_location organ age condition replicate batch reference
SRR505743 Arabidopsis_thaliana Col-0 single-read random sra root seedling controlled 1 1 SRP013631
SRR505744 Arabidopsis_thaliana Col-0 single-read random sra leaf seedling controlled 1 1 SRP013631
SRR953400 Arabidopsis_thaliana Col-0 single-read random sra leaf seedling controlled 1 1 PRJNA215448
SRR1005386 Arabidopsis_thaliana Col-0 single-read random sra leaf seedling controlled 1 1 PRJNA222364
SRR578947 Arabidopsis_thaliana Col-0 single-read random sra root seedling controlled 1 1 SRP013631
SRR578948 Arabidopsis_thaliana Col-0 single-read random sra root seedling controlled 1 1 SRP013631
ERR2096663 Zea_mays B73 paired-end polyA sra leaf seedling controlled 1 1 PRJEB22166
ERR2096664 Zea_mays B73 paired-end polyA sra leaf seedling controlled 1 1 PRJEB22166
ERR2096665 Zea_mays B73 paired-end polyA sra leaf seedling controlled 1 1 PRJEB22166
ERR2096666 Zea_mays B73 paired-end polyA sra leaf seedling controlled 1 1 PRJEB22166
ERR2096667 Zea_mays B73 paired-end polyA sra leaf seedling controlled 1 1 PRJEB22166
ERR3773807 Zea_mays B73 paired-end polyA sra root seedling controlled 1 1 PRJEB35943
ERR3773808 Zea_mays B73 paired-end polyA sra root seedling controlled 1 1 PRJEB35943
ERR986091 Zea_mays B73 paired-end random sra root seedling controlled 1 1 PRJEB10406

数据集来源

sample_name species genotype library_layout library_selection reads_location organ age condition replicate batch reference
SRR505743 Arabidopsis_thaliana Col-0 single-read random sra root seedling controlled 1 1 SRP013631
SRR505744 Arabidopsis_thaliana Col-0 single-read random sra leaf seedling controlled 1 1 SRP013631
SRR953400 Arabidopsis_thaliana Col-0 single-read random sra leaf seedling controlled 1 1 PRJNA215448
SRR1005386 Arabidopsis_thaliana Col-0 single-read random sra leaf seedling controlled 1 1 PRJNA222364
SRR578947 Arabidopsis_thaliana Col-0 single-read random sra root seedling controlled 1 1 SRP013631
SRR578948 Arabidopsis_thaliana Col-0 single-read random sra root seedling controlled 1 1 SRP013631
ERR2096663 Zea_mays B73 paired-end polyA sra leaf seedling controlled 1 1 PRJEB22166
ERR2096664 Zea_mays B73 paired-end polyA sra leaf seedling controlled 1 1 PRJEB22166
ERR2096665 Zea_mays B73 paired-end polyA sra leaf seedling controlled 1 1 PRJEB22166
ERR2096666 Zea
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集是一个用于基准测试序列到基因表达预测机器学习模型的植物基因表达数据集,主要包含玉米和拟南芥两种物种的叶和根组织基因表达值。数据集按物种-任务-组织划分,提供基因组注释文件和任务相关的TSV文件,并采用80%训练、10%验证、10%测试的标准划分,适用于机器学习模型训练与评估。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作