OpenDataArena/MMFineReason-Full-2.3M-Qwen3-VL-235B-Thinking
收藏Hugging Face2026-02-03 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/OpenDataArena/MMFineReason-Full-2.3M-Qwen3-VL-235B-Thinking
下载链接
链接失效反馈官方服务:
资源简介:
MMFineReason-Full-2.3M是一个完整的预选数据集,包含230万个样本和88亿个解决方案令牌,这些数据是通过我们的推理蒸馏流程在数据选择阶段之前生成的。该数据集包括所有通过基本模板和长度验证的样本,但尚未经过正确性验证过滤。此数据集旨在用于研究目的,便于研究数据质量、过滤策略和一致性模式。值得注意的是,对于训练,推荐使用过滤后的版本(MMFineReason-1.8M),该版本移除了约20%不一致的样本。
MMFineReason-Full-2.3M is the complete pre-selection dataset containing 2.3M samples and 8.8B solution tokens, generated through our reasoning distillation pipeline before the data selection stage. This dataset includes all samples that passed basic template and length validation, but have not undergone correctness verification filtering. It is released for research purposes, enabling study of data quality, filtering strategies, and consistency patterns. For training, the filtered MMFineReason-1.8M is recommended which removes ~20% inconsistent samples.
提供机构:
OpenDataArena



