General_VLM_pretrain_data
收藏魔搭社区2025-10-16 更新2025-09-13 收录
下载链接:
https://modelscope.cn/datasets/BNUIVCly/General_VLM_pretrain_data
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains the data presented in [Video-R1: Reinforcing Video Reasoning in MLLMs](https://huggingface.co/papers/2503.21776).
Code: https://github.com/tulerfeng/Video-R1
Video data folder: CLEVRER, LLaVA-Video-178K, NeXT-QA, PerceptionTest, STAR
Image data folder: Chart, General, Knowledge, Math, OCR, Spatial
Video-R1-COT-165k.json is for SFT cold start, and Video-R1-260k.json is for RL training.
Data Format in Video-R1-COT-165k:
```
{
"problem_id": 2,
"problem": "What appears on the screen in Russian during the missile's ascent?",
"data_type": "video",
"problem_type": "multiple choice",
"options": [
"A. A YouTube subscription notification",
"B. A military command",
"C. A warning message",
"D. A weather update"
],
"process": "<think>Let me think about this for a moment. The video shows a missile being launched, and there's a pop-up that appears in Russian during the missile's ascent. Hmm, what could it be? Wait, it looks familiar. It resembles something I've seen before online.\n\nOh, I see! It's a YouTube pop-up, specifically a notification that says \"Вы подписаны,\" which translates to \"You are subscribed.\" This is a common feature on YouTube when someone subscribes to a channel. Let's break it down: it's not a military command, a warning message, or a weather update—it's clearly related to YouTube.\n\nSo, after some self-reflection and verification, I'm confident that the pop-up is indeed a YouTube subscription notification.</think>",
"solution": "<answer>A</answer>",
"path": "./LLaVA-Video-178K/liwei_youtube_videos/videos/youtube_video_2024/ytb_7nRmsEw7nsE.mp4",
"data_source": "LLaVA-Video-178K/30_60_s_youtube_v0_1"
}
```
本仓库包含论文[Video-R1: 强化多模态大语言模型(Multimodal Large Language Models,MLLMs)的视频推理能力](https://huggingface.co/papers/2503.21776)中呈现的相关数据。
代码仓库:https://github.com/tulerfeng/Video-R1
视频数据集文件夹包含:CLEVRER、LLaVA-Video-178K、NeXT-QA、PerceptionTest、STAR。
图像数据集文件夹包含:Chart、General、Knowledge、Math、OCR、Spatial。
其中,Video-R1-COT-165k.json 用于监督微调(Supervised Fine-Tuning,SFT)冷启动阶段,Video-R1-260k.json 用于强化学习(Reinforcement Learning,RL)训练。
Video-R1-COT-165k 中的数据格式示例如下:
{
"problem_id": 2,
"problem": "导弹上升阶段,屏幕上出现的俄文内容是什么?",
"data_type": "视频",
"problem_type": "选择题",
"options": [
"A. YouTube订阅通知",
"B. 军事指令",
"C. 警告信息",
"D. 天气更新"
],
"process": "<think>请稍作思考。视频展示了导弹发射过程,导弹上升阶段屏幕上弹出了俄文弹窗。这会是什么呢?等等,这看起来有些眼熟,似乎曾在网络上见过。哦,我明白了!这是YouTube的弹窗,具体是显示“Вы подписаны”的通知,其译文为“您已订阅”,这是用户订阅YouTube频道时的常见提示。我们可以逐一排除:它既非军事指令、警告信息,也非天气更新,显然与YouTube相关。经过仔细推敲与验证,可以确定该弹窗确实为YouTube订阅通知。</think>",
"solution": "<answer>A</answer>",
"path": "./LLaVA-Video-178K/liwei_youtube_videos/videos/youtube_video_2024/ytb_7nRmsEw7nsE.mp4",
"data_source": "LLaVA-Video-178K/30_60_s_youtube_v0_1"
}
提供机构:
maas
创建时间:
2025-09-11



