five

GUE

收藏
魔搭社区2025-11-11 更新2025-07-05 收录
下载链接:
https://modelscope.cn/datasets/lgq12697/GUE
下载链接
链接失效反馈
官方服务:
资源简介:
configs: - config_name: emp_H3 data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: emp_H3K14ac data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: emp_H3K36me3 data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: emp_H3K4me1 data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: emp_H3K4me2 data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: emp_H3K4me3 data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: emp_H3K79me3 data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: emp_H3K9ac data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: emp_H4 data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: emp_H4ac data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: human_tf_0 data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: human_tf_1 data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: human_tf_2 data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: human_tf_3 data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: human_tf_4 data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: mouse_0 data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: mouse_1 data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: mouse_2 data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: mouse_3 data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: mouse_4 data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: prom_300_all data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: prom_300_notata data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: prom_300_tata data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: prom_core_all data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" default: true - config_name: prom_core_notata data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: prom_core_tata data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: splice_reconstructed data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: virus_covid data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: virus_species_40 data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: fungi_species_20 data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: EPI_K562 data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: EPI_HeLa-S3 data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: EPI_NHEK data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: EPI_IMR90 data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: EPI_HUVEC data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: EPI_GM12878 data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" - config_name: phage_fragments data_files: - split: train path: "train.csv" - split: test path: "test.csv" - split: dev path: "dev.csv" --- This is a copy of the Genome Understanding Evaluation (GUE) that was presented in DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome Zhihan Zhou and Yanrong Ji and Weijian Li and Pratik Dutta and Ramana Davuluri and Han Liu and is available to download directly from https://github.com/MAGICS-LAB/DNABERT_2 If you use this dataset, please cite @misc{zhou2023dnabert2, title={DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome}, author={Zhihan Zhou and Yanrong Ji and Weijian Li and Pratik Dutta and Ramana Davuluri and Han Liu}, year={2023}, eprint={2306.15006}, archivePrefix={arXiv}, primaryClass={q-bio.GN} }

配置项: - 配置名称:emp_H3 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:emp_H3K14ac 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:emp_H3K36me3 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:emp_H3K4me1 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:emp_H3K4me2 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:emp_H3K4me3 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:emp_H3K79me3 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:emp_H3K9ac 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:emp_H4 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:emp_H4ac 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:human_tf_0 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:human_tf_1 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:human_tf_2 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:human_tf_3 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:human_tf_4 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:mouse_0 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:mouse_1 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:mouse_2 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:mouse_3 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:mouse_4 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:prom_300_all 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" 默认启用:是 - 配置名称:prom_300_notata 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:prom_300_tata 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:prom_core_all 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" 默认启用:是 - 配置名称:prom_core_notata 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:prom_core_tata 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:splice_reconstructed 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:virus_covid 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:virus_species_40 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:fungi_species_20 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:EPI_K562 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:EPI_HeLa-S3 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:EPI_NHEK 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:EPI_IMR90 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:EPI_HUVEC 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:EPI_GM12878 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" - 配置名称:phage_fragments 数据文件: - 划分集(split):训练集(train),路径:"train.csv" - 划分集(split):测试集(test),路径:"test.csv" - 划分集(split):验证集(dev),路径:"dev.csv" --- 本数据集为发表于论文《DNABERT-2: 面向多物种基因组的高效基础模型与基准测试》的基因组理解评估(Genome Understanding Evaluation, GUE)的副本,该论文作者为周志涵、纪延蓉、李卫健、普拉蒂克·杜塔、拉马纳·达武卢里、刘涵。本数据集可直接从 https://github.com/MAGICS-LAB/DNABERT_2 下载获取。 若使用本数据集,请引用如下文献: @misc{zhou2023dnabert2, title={DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome}, author={Zhihan Zhou and Yanrong Ji and Weijian Li and Pratik Dutta and Ramana Davuluri and Han Liu}, year={2023}, eprint={2306.15006}, archivePrefix={arXiv}, primaryClass={q-bio.GN} }
提供机构:
maas
创建时间:
2025-08-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作