five

R1-Onevision-Bench

收藏
魔搭社区2025-11-27 更新2025-03-15 收录
下载链接:
https://modelscope.cn/datasets/Fancy-MLLM/R1-Onevision-Bench
下载链接
链接失效反馈
官方服务:
资源简介:
# R1-Onevision-Bench [\[📂 GitHub\]](https://github.com/Fancy-MLLM/R1-Onevision)[\[📝 Paper\]](https://arxiv.org/pdf/2503.10615) [\[🤗 HF Dataset\]](https://huggingface.co/datasets/Fancy-MLLM/R1-onevision) [\[🤗 HF Model\]](https://huggingface.co/Fancy-MLLM/R1-Onevision-7B) [\[🤗 HF Demo\]](https://huggingface.co/spaces/Fancy-MLLM/R1-OneVision) ## Dataset Overview R1-Onevision-Bench comprises 38 subcategories organized into 5 major domains, including Math, Biology, Chemistry, Physics, Deducation. Additionally, the tasks are categorized into five levels of difficulty, ranging from ‘Junior High School’ to ‘Social Test’ challenges, ensuring a comprehensive evaluation of model capabilities across varying complexities. ## Data Format Reasoning problems are stored in TSV format, with each row containing the following fields: - `index`: data id - `question`: visual reasoning question - `answer`: ground truth answer - `category`: question category - `image`: base64 - `choices`: available answer choices - `level`: question difficulty level ## Benchmark Distribution <img src="https://cdn-uploads.huggingface.co/production/uploads/65af78bb3e82498d4c65ed2a/PXGfxg9xjMYb5qvXt68le.png" width="50%" /> <img src="https://cdn-uploads.huggingface.co/production/uploads/65af78bb3e82498d4c65ed2a/CiwppKyI4OO2YHcsjboif.png" width="50%" /> ## Benchmark Samples <img src="https://cdn-uploads.huggingface.co/production/uploads/65af78bb3e82498d4c65ed2a/9qfmkt-ZjDzjFb1_gLkoQ.png" width="90%" /> # Institution - Zhejiang University ## Benchmark Contact - yang-yi@zju.edu.cn - xiaoxuanhe@zju.edu.cn - panhongkun@zju.edu.cn

# R1-Onevision-Bench [📢 GitHub](https://github.com/Fancy-MLLM/R1-Onevision)[📝 论文](https://arxiv.org/pdf/2503.10615) [🤗 Hugging Face (HF) 数据集](https://huggingface.co/datasets/Fancy-MLLM/R1-onevision) [🤗 Hugging Face (HF) 模型](https://huggingface.co/Fancy-MLLM/R1-Onevision-7B) [🤗 Hugging Face (HF) 演示](https://huggingface.co/spaces/Fancy-MLLM/R1-OneVision) ## 数据集概览 R1-Onevision-Bench 包含38个子类别,归为5大领域,涵盖数学、生物学、化学、物理学、教育学。此外,所有任务被划分为五个难度等级,覆盖从‘初中’到‘社会学科测试’的各类挑战,可全面评估模型在不同复杂度下的能力表现。 ## 数据格式 推理类问题以TSV格式(Tab-Separated Values)存储,每行包含以下字段: - `index`: 数据编号 - `question`: 视觉推理问题 - `answer`: 标准答案(真实标注答案) - `category`: 问题类别 - `image`: base64编码图像 - `choices`: 可选答案选项 - `level`: 问题难度等级 ## 基准测试分布 <img src="https://cdn-uploads.huggingface.co/production/uploads/65af78bb3e82498d4c65ed2a/PXGfxg9xjMYb5qvXt68le.png" width="50%" /> <img src="https://cdn-uploads.huggingface.co/production/uploads/65af78bb3e82498d4c65ed2a/CiwppKyI4OO2YHcsjboif.png" width="50%" /> ## 基准测试示例 <img src="https://cdn-uploads.huggingface.co/production/uploads/65af78bb3e82498d4c65ed2a/9qfmkt-ZjDzjFb1_gLkoQ.png" width="90%" /> ## 机构 - 浙江大学 ## 基准测试联系方式 - yang-yi@zju.edu.cn - xiaoxuanhe@zju.edu.cn - panhongkun@zju.edu.cn
提供机构:
maas
创建时间:
2025-03-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作