Qwen3-VL评测数据集合

Name: Qwen3-VL评测数据集合
Creator: maas
Published: 2026-05-23 16:32:59
License: 暂无描述

魔搭社区2026-05-23 更新2025-09-27 收录

下载链接：

https://modelscope.cn/datasets/evalscope/Qwen3-VL-Test-Collection

下载链接

链接失效反馈

官方服务：

资源简介：

# Qwen3-VL 评测数据集合该数据集由EvalScope工具生成，评测了模型的数学能力(GSM8K)、知识能力(MMLU-Pro)、指令遵循(IFEval)、多模态知识能力(MMMU-Pro)、多模态数学能力(MathVista)。具体使用方法参考[Qwen3-VL 模型评测最佳实践](https://evalscope.readthedocs.io/zh-cn/latest/best_practice/qwen3_vl.html) 评测输出示例： ```text +-------------+---------------------+--------------+---------------+-------+ | task_type | metric | dataset_name | average_score | count | +-------------+---------------------+--------------+---------------+-------+ | exam | acc | mmmu_pro | 0.5 | 38 | | math | acc | math_vista | 0.7917 | 24 | | exam | acc | mmlu_pro | 0.3077 | 13 | | math | acc | gsm8k | 0.7692 | 13 | | instruction | prompt_level_strict | ifeval | 0.6667 | 12 | +-------------+---------------------+--------------+---------------+-------+ ``` #### 下载方法 :modelscope-code[]{type="sdk"} :modelscope-code[]{type="git"}

# Qwen3-VL 评测数据集该数据集由EvalScope工具生成，用于评测模型的数学能力（GSM8K）、通用知识能力（MMLU-Pro）、指令遵循能力（IFEval）、多模态知识能力（MMMU-Pro）以及多模态数学能力（MathVista）。具体使用方法可参考[Qwen3-VL 模型评测最佳实践](https://evalscope.readthedocs.io/zh-cn/latest/best_practice/qwen3_vl.html) 评测输出示例： text +-------------+---------------------+--------------+---------------+-------+ | 任务类型 | 评价指标 | 数据集名称 | 平均得分 | 样本数量| +-------------+---------------------+--------------+---------------+-------+ | exam | acc | mmmu_pro | 0.5 | 38 | | math | acc | math_vista | 0.7917 | 24 | | exam | acc | mmlu_pro | 0.3077 | 13 | | math | acc | gsm8k | 0.7692 | 13 | | instruction | prompt_level_strict | ifeval | 0.6667 | 12 | +-------------+---------------------+--------------+---------------+-------+ #### 下载方式 :modelscope-code[]{type="sdk"} :modelscope-code[]{type="git"}

提供机构：

maas

创建时间：

2025-09-05

搜集汇总

数据集介绍

背景与挑战

背景概述

Qwen3-VL评测数据集合由EvalScope工具生成，用于评估模型在数学能力（GSM8K）、知识能力（MMLU-Pro）、指令遵循（IFEval）、多模态知识能力（MMMU-Pro）和多模态数学能力（MathVista）等方面的表现。数据集提供了详细的评估输出示例和多种下载方法。

以上内容由遇见数据集搜集并总结生成