zwyang6/Perception_ZwZ-RL-VQA
收藏Hugging Face2026-04-23 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/zwyang6/Perception_ZwZ-RL-VQA
下载链接
链接失效反馈官方服务:
资源简介:
ZwZ-RL-VQA数据集包含74K高质量的视觉问答(VQA)对,通过区域到图像蒸馏(R2I)方法生成,用于训练多模态大语言模型(MLLMs)在细粒度感知任务上。数据集的主要特点包括高分辨率图像(大多数>1000×1000)、细粒度裁剪(大多数<10%的全图像面积)、多种问题类型(如计数、OCR、颜色、结构、材料、识别等)以及严格的教师模型共识过滤(>6/8的教师模型一致同意)。数据集的生成过程涉及多个强大的教师模型(如Qwen3-VL-235B和GLM-4.5V),并通过共识过滤和质量控制确保数据质量。数据集适用于强化学习和研究如何将工具使用能力蒸馏到单次推理模型中。
The ZwZ-RL-VQA dataset contains 74K high-quality VQA pairs generated via Region-to-Image Distillation (R2I) for training multimodal large language models (MLLMs) on fine-grained perception tasks. Key features include high-resolution images (mostly >1000×1000), fine-grained crops (mostly <10% of full image area), diverse question types (e.g., counting, OCR, color, structure, material, identification), and strict teacher consensus filtering (>6/8 agreement). The dataset is generated using powerful teacher models (Qwen3-VL-235B, GLM-4.5V) with rigorous quality control. Its designed for reinforcement learning and research on distilling tool-use capabilities into single-pass models.
提供机构:
zwyang6



