five

OpenDataArena/MMFineReason-1.8M-Qwen3-VL-235B-Thinking

收藏
Hugging Face2026-03-04 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/OpenDataArena/MMFineReason-1.8M-Qwen3-VL-235B-Thinking
下载链接
链接失效反馈
官方服务:
资源简介:
MMFineReason是一个大规模、高质量的多模态推理数据集,包含180万样本和51亿解决方案标记,具有从Qwen3-VL-235B-A22B-Thinking提取的详细推理注释。数据集覆盖数学(79.4%)、科学(13.8%)、谜题/游戏(4.6%)和通用/OCR(2.2%)等多个领域。其特点是长形式的链式推理(平均推理长度为2,910标记,是HoneyBee的2.7倍,OpenMMReasoner的4.3倍)和100%的视觉描述覆盖率(平均描述长度为609标记)。数据集通过三阶段构建流程(数据收集与标准化、推理蒸馏和数据选择)确保了高质量和多样性。此外,数据集还提供了详细的统计信息和基准测试结果,展示了其在多模态推理任务中的卓越性能。

MMFineReason is a large-scale, high-quality multimodal reasoning dataset comprising 1.8M samples and 5.1B solution tokens, featuring detailed reasoning annotations distilled from Qwen3-VL-235B-A22B-Thinking. The dataset spans multiple domains: Mathematics (79.4%), Science (13.8%), Puzzle/Game (4.6%), and General/OCR (2.2%). It is characterized by long-form Chain-of-Thought reasoning (average length of 2,910 tokens—2.7× longer than HoneyBee and 4.3× longer than OpenMMReasoner) and 100% caption coverage (average caption length of 609 tokens). The dataset is constructed through a three-stage pipeline (Data Collection & Standardization, Reasoning Distillation, and Data Selection) to ensure high quality and diversity. Additionally, it provides detailed statistics and benchmark results, demonstrating its superior performance in multimodal reasoning tasks.
提供机构:
OpenDataArena
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作