GUE
收藏魔搭社区2025-11-11 更新2025-07-05 收录
下载链接:
https://modelscope.cn/datasets/lgq12697/GUE
下载链接
链接失效反馈官方服务:
资源简介:
configs:
- config_name: emp_H3
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: emp_H3K14ac
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: emp_H3K36me3
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: emp_H3K4me1
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: emp_H3K4me2
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: emp_H3K4me3
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: emp_H3K79me3
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: emp_H3K9ac
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: emp_H4
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: emp_H4ac
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: human_tf_0
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: human_tf_1
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: human_tf_2
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: human_tf_3
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: human_tf_4
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: mouse_0
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: mouse_1
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: mouse_2
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: mouse_3
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: mouse_4
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: prom_300_all
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: prom_300_notata
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: prom_300_tata
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: prom_core_all
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
default: true
- config_name: prom_core_notata
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: prom_core_tata
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: splice_reconstructed
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: virus_covid
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: virus_species_40
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: fungi_species_20
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: EPI_K562
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: EPI_HeLa-S3
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: EPI_NHEK
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: EPI_IMR90
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: EPI_HUVEC
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: EPI_GM12878
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
- config_name: phage_fragments
data_files:
- split: train
path: "train.csv"
- split: test
path: "test.csv"
- split: dev
path: "dev.csv"
---
This is a copy of the Genome Understanding Evaluation (GUE) that was presented in
DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome
Zhihan Zhou and Yanrong Ji and Weijian Li and Pratik Dutta and Ramana Davuluri and Han Liu
and is available to download directly from
https://github.com/MAGICS-LAB/DNABERT_2
If you use this dataset, please cite
@misc{zhou2023dnabert2,
title={DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome},
author={Zhihan Zhou and Yanrong Ji and Weijian Li and Pratik Dutta and Ramana Davuluri and Han Liu},
year={2023},
eprint={2306.15006},
archivePrefix={arXiv},
primaryClass={q-bio.GN}
}
配置项:
- 配置名称:emp_H3
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:emp_H3K14ac
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:emp_H3K36me3
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:emp_H3K4me1
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:emp_H3K4me2
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:emp_H3K4me3
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:emp_H3K79me3
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:emp_H3K9ac
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:emp_H4
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:emp_H4ac
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:human_tf_0
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:human_tf_1
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:human_tf_2
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:human_tf_3
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:human_tf_4
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:mouse_0
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:mouse_1
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:mouse_2
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:mouse_3
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:mouse_4
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:prom_300_all
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
默认启用:是
- 配置名称:prom_300_notata
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:prom_300_tata
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:prom_core_all
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
默认启用:是
- 配置名称:prom_core_notata
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:prom_core_tata
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:splice_reconstructed
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:virus_covid
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:virus_species_40
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:fungi_species_20
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:EPI_K562
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:EPI_HeLa-S3
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:EPI_NHEK
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:EPI_IMR90
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:EPI_HUVEC
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:EPI_GM12878
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
- 配置名称:phage_fragments
数据文件:
- 划分集(split):训练集(train),路径:"train.csv"
- 划分集(split):测试集(test),路径:"test.csv"
- 划分集(split):验证集(dev),路径:"dev.csv"
---
本数据集为发表于论文《DNABERT-2: 面向多物种基因组的高效基础模型与基准测试》的基因组理解评估(Genome Understanding Evaluation, GUE)的副本,该论文作者为周志涵、纪延蓉、李卫健、普拉蒂克·杜塔、拉马纳·达武卢里、刘涵。本数据集可直接从 https://github.com/MAGICS-LAB/DNABERT_2 下载获取。
若使用本数据集,请引用如下文献:
@misc{zhou2023dnabert2,
title={DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome},
author={Zhihan Zhou and Yanrong Ji and Weijian Li and Pratik Dutta and Ramana Davuluri and Han Liu},
year={2023},
eprint={2306.15006},
archivePrefix={arXiv},
primaryClass={q-bio.GN}
}
提供机构:
maas
创建时间:
2025-08-12



