R1-Onevision-Bench
收藏魔搭社区2025-11-27 更新2025-03-15 收录
下载链接:
https://modelscope.cn/datasets/Fancy-MLLM/R1-Onevision-Bench
下载链接
链接失效反馈官方服务:
资源简介:
# R1-Onevision-Bench
[\[📂 GitHub\]](https://github.com/Fancy-MLLM/R1-Onevision)[\[📝 Paper\]](https://arxiv.org/pdf/2503.10615)
[\[🤗 HF Dataset\]](https://huggingface.co/datasets/Fancy-MLLM/R1-onevision) [\[🤗 HF Model\]](https://huggingface.co/Fancy-MLLM/R1-Onevision-7B) [\[🤗 HF Demo\]](https://huggingface.co/spaces/Fancy-MLLM/R1-OneVision)
## Dataset Overview
R1-Onevision-Bench comprises 38 subcategories organized into 5 major domains, including Math, Biology, Chemistry, Physics, Deducation. Additionally, the tasks are categorized into five levels of difficulty, ranging from ‘Junior High School’ to ‘Social Test’ challenges, ensuring a comprehensive evaluation of model capabilities across varying complexities.
## Data Format
Reasoning problems are stored in TSV format, with each row containing the following fields:
- `index`: data id
- `question`: visual reasoning question
- `answer`: ground truth answer
- `category`: question category
- `image`: base64
- `choices`: available answer choices
- `level`: question difficulty level
## Benchmark Distribution
<img src="https://cdn-uploads.huggingface.co/production/uploads/65af78bb3e82498d4c65ed2a/PXGfxg9xjMYb5qvXt68le.png" width="50%" />
<img src="https://cdn-uploads.huggingface.co/production/uploads/65af78bb3e82498d4c65ed2a/CiwppKyI4OO2YHcsjboif.png" width="50%" />
## Benchmark Samples
<img src="https://cdn-uploads.huggingface.co/production/uploads/65af78bb3e82498d4c65ed2a/9qfmkt-ZjDzjFb1_gLkoQ.png" width="90%" />
# Institution
- Zhejiang University
## Benchmark Contact
- yang-yi@zju.edu.cn
- xiaoxuanhe@zju.edu.cn
- panhongkun@zju.edu.cn
# R1-Onevision-Bench
[📢 GitHub](https://github.com/Fancy-MLLM/R1-Onevision)[📝 论文](https://arxiv.org/pdf/2503.10615)
[🤗 Hugging Face (HF) 数据集](https://huggingface.co/datasets/Fancy-MLLM/R1-onevision) [🤗 Hugging Face (HF) 模型](https://huggingface.co/Fancy-MLLM/R1-Onevision-7B) [🤗 Hugging Face (HF) 演示](https://huggingface.co/spaces/Fancy-MLLM/R1-OneVision)
## 数据集概览
R1-Onevision-Bench 包含38个子类别,归为5大领域,涵盖数学、生物学、化学、物理学、教育学。此外,所有任务被划分为五个难度等级,覆盖从‘初中’到‘社会学科测试’的各类挑战,可全面评估模型在不同复杂度下的能力表现。
## 数据格式
推理类问题以TSV格式(Tab-Separated Values)存储,每行包含以下字段:
- `index`: 数据编号
- `question`: 视觉推理问题
- `answer`: 标准答案(真实标注答案)
- `category`: 问题类别
- `image`: base64编码图像
- `choices`: 可选答案选项
- `level`: 问题难度等级
## 基准测试分布
<img src="https://cdn-uploads.huggingface.co/production/uploads/65af78bb3e82498d4c65ed2a/PXGfxg9xjMYb5qvXt68le.png" width="50%" />
<img src="https://cdn-uploads.huggingface.co/production/uploads/65af78bb3e82498d4c65ed2a/CiwppKyI4OO2YHcsjboif.png" width="50%" />
## 基准测试示例
<img src="https://cdn-uploads.huggingface.co/production/uploads/65af78bb3e82498d4c65ed2a/9qfmkt-ZjDzjFb1_gLkoQ.png" width="90%" />
## 机构
- 浙江大学
## 基准测试联系方式
- yang-yi@zju.edu.cn
- xiaoxuanhe@zju.edu.cn
- panhongkun@zju.edu.cn
提供机构:
maas
创建时间:
2025-03-12



