PixelReasoner-SFT-Data
收藏魔搭社区2025-12-05 更新2025-06-07 收录
下载链接:
https://modelscope.cn/datasets/TIGER-Lab/PixelReasoner-SFT-Data
下载链接
链接失效反馈官方服务:
资源简介:
**Overview.**
The SFT data for training [**Pixel Reasoner**: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning](https://arxiv.org/abs/2505.15966),
The queries require fine-grained visual analysis in both images (e.g., infographics, visually-rich scenes, etc) and videos.
**Details.**
The data contains 8,000+ reasoning trajectories, including :
- 2,000+ textual reasoning trajectories, rejection sampled from the base model Qwen2.5-VL-Instruct. These data aims to preserve textual reasoning ability on easier VL queries.
- 6,000+ pixel-space reasoning trajectories synthesized using GPT-4o. These include single-pass trajectories and error-induced self-correction trajectories, on both image and video inputs.
**Update:** The initial data has merged the textual reasoning and pixel-space reasoning to train adaptive use of different reasoning modes. The separate files for each reasoning mode is now uploaded.
Each trajectory is stored as message list, following Qwen's message template.
The data is structured according to the SFT code.
**Note**: Remember to unzip `images.zip` and `videos.zip`, and replace the relative path with the absolute path in `image` and `video` key of the message entries.
**Training Code**: The SFT code can be found at https://github.com/TIGER-AI-Lab/Pixel-Reasoner/tree/main
**Project page**: https://tiger-ai-lab.github.io/Pixel-Reasoner/
**概述。**
本数据集为训练《Pixel Reasoner:基于好奇心驱动强化学习的像素空间推理方法》(Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning)[https://arxiv.org/abs/2505.15966]的监督微调(Supervised Fine-Tuning, SFT)数据,其查询任务要求对图像(如信息图表、视觉元素丰富的场景等)与视频均开展细粒度视觉分析。
**详情。**
本数据集包含8000余条推理轨迹,具体分为两类:
- 2000余条文本推理轨迹:通过对基础模型Qwen2.5-VL-Instruct进行拒绝采样得到,旨在保留模型在简易视觉语言(Vision-Language, VL)查询任务上的文本推理能力。
- 6000余条像素空间推理轨迹:由GPT-4o生成合成,涵盖单轮推理轨迹与带错误诱导的自我修正推理轨迹,支持图像与视频两类输入。
**更新说明。**
初始数据集已合并文本推理与像素空间推理轨迹,以训练模型自适应调用不同推理模式;现已上传针对各推理模式的独立数据文件。
每条推理轨迹均以消息列表形式存储,遵循Qwen系列模型的对话模板规范。
本数据集的组织格式适配SFT训练代码要求。
**注意事项:** 请务必解压`images.zip`与`videos.zip`,并将消息条目中`image`与`video`字段内的相对路径替换为绝对路径。
**训练代码:** 监督微调代码可参见https://github.com/TIGER-AI-Lab/Pixel-Reasoner/tree/main
**项目主页:** https://tiger-ai-lab.github.io/Pixel-Reasoner/
提供机构:
maas
创建时间:
2025-05-23



