Video-R1-data
收藏魔搭社区2026-01-09 更新2025-04-05 收录
下载链接:
https://modelscope.cn/datasets/Video-R1/Video-R1-data
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains the data presented in [Video-R1: Reinforcing Video Reasoning in MLLMs](https://huggingface.co/papers/2503.21776).
Code: https://github.com/tulerfeng/Video-R1
Video data folder: CLEVRER, LLaVA-Video-178K, NeXT-QA, PerceptionTest, STAR
Image data folder: Chart, General, Knowledge, Math, OCR, Spatial
Video-R1-COT-165k.json is for SFT cold start, and Video-R1-260k.json is for RL training.
Data Format in Video-R1-COT-165k:
```
{
"problem_id": 2,
"problem": "What appears on the screen in Russian during the missile's ascent?",
"data_type": "video",
"problem_type": "multiple choice",
"options": [
"A. A YouTube subscription notification",
"B. A military command",
"C. A warning message",
"D. A weather update"
],
"process": "<think>Let me think about this for a moment. The video shows a missile being launched, and there's a pop-up that appears in Russian during the missile's ascent. Hmm, what could it be? Wait, it looks familiar. It resembles something I've seen before online.\n\nOh, I see! It's a YouTube pop-up, specifically a notification that says \"Вы подписаны,\" which translates to \"You are subscribed.\" This is a common feature on YouTube when someone subscribes to a channel. Let's break it down: it's not a military command, a warning message, or a weather update—it's clearly related to YouTube.\n\nSo, after some self-reflection and verification, I'm confident that the pop-up is indeed a YouTube subscription notification.</think>",
"solution": "<answer>A</answer>",
"path": "./LLaVA-Video-178K/liwei_youtube_videos/videos/youtube_video_2024/ytb_7nRmsEw7nsE.mp4",
"data_source": "LLaVA-Video-178K/30_60_s_youtube_v0_1"
}
```
本代码仓库收录了论文《Video-R1:强化多模态大语言模型(Multimodal Large Language Model, MLLM)的视频推理能力》(https://huggingface.co/papers/2503.21776)所展示的数据集。
代码地址:https://github.com/tulerfeng/Video-R1
视频数据集文件夹包含:CLEVRER、LLaVA-Video-178K、NeXT-QA、PerceptionTest、STAR
图像数据集文件夹包含:Chart、General、Knowledge、Math、OCR、Spatial
其中,Video-R1-COT-165k.json 用于监督微调冷启动任务,Video-R1-260k.json 用于强化学习训练任务。
Video-R1-COT-165k 的数据格式示例如下:
{
"problem_id": 2,
"problem": "导弹上升阶段,屏幕上出现了什么俄文内容?",
"data_type": "视频",
"problem_type": "选择题",
"options": [
"A. YouTube订阅通知",
"B. 军事指令",
"C. 警告信息",
"D. 天气播报"
],
"process": "<think>让我稍作思考。视频展示了一枚导弹的发射过程,在导弹上升阶段出现了一个俄文弹窗。我在想这会是什么?等等,它看起来很眼熟。哦,我想起来了!这是YouTube的弹窗,具体来说是显示“Вы подписаны”的通知,其译文为“您已订阅”,这是YouTube频道订阅后的常见提示功能。我们可以逐一排除:这并非军事指令、警告信息或天气播报,显然与YouTube相关。经过一番自我梳理与验证,我确信该弹窗确实是YouTube订阅通知。</think>",
"solution": "<answer>A</answer>",
"path": "./LLaVA-Video-178K/liwei_youtube_videos/videos/youtube_video_2024/ytb_7nRmsEw7nsE.mp4",
"data_source": "LLaVA-Video-178K/30_60_s_youtube_v0_1"
}
提供机构:
maas
创建时间:
2025-03-28
搜集汇总
数据集介绍

背景与挑战
背景概述
Video-R1-data是一个用于强化多模态大语言模型视频推理能力的数据集,包含视频和图像数据,覆盖多个领域如CLEVRER、LLaVA-Video-178K和Chart等。数据集提供结构化JSON文件,支持监督微调(SFT)和强化学习(RL)训练,旨在通过思维链过程提升模型对视频内容的理解和推理能力。
以上内容由遇见数据集搜集并总结生成



