five

General_VLM_pretrain_data

收藏
魔搭社区2025-10-16 更新2025-09-13 收录
下载链接:
https://modelscope.cn/datasets/BNUIVCly/General_VLM_pretrain_data
下载链接
链接失效反馈
官方服务:
资源简介:
This repository contains the data presented in [Video-R1: Reinforcing Video Reasoning in MLLMs](https://huggingface.co/papers/2503.21776). Code: https://github.com/tulerfeng/Video-R1 Video data folder: CLEVRER, LLaVA-Video-178K, NeXT-QA, PerceptionTest, STAR Image data folder: Chart, General, Knowledge, Math, OCR, Spatial Video-R1-COT-165k.json is for SFT cold start, and Video-R1-260k.json is for RL training. Data Format in Video-R1-COT-165k: ``` { "problem_id": 2, "problem": "What appears on the screen in Russian during the missile's ascent?", "data_type": "video", "problem_type": "multiple choice", "options": [ "A. A YouTube subscription notification", "B. A military command", "C. A warning message", "D. A weather update" ], "process": "<think>Let me think about this for a moment. The video shows a missile being launched, and there's a pop-up that appears in Russian during the missile's ascent. Hmm, what could it be? Wait, it looks familiar. It resembles something I've seen before online.\n\nOh, I see! It's a YouTube pop-up, specifically a notification that says \"Вы подписаны,\" which translates to \"You are subscribed.\" This is a common feature on YouTube when someone subscribes to a channel. Let's break it down: it's not a military command, a warning message, or a weather update—it's clearly related to YouTube.\n\nSo, after some self-reflection and verification, I'm confident that the pop-up is indeed a YouTube subscription notification.</think>", "solution": "<answer>A</answer>", "path": "./LLaVA-Video-178K/liwei_youtube_videos/videos/youtube_video_2024/ytb_7nRmsEw7nsE.mp4", "data_source": "LLaVA-Video-178K/30_60_s_youtube_v0_1" } ```

本仓库包含论文[Video-R1: 强化多模态大语言模型(Multimodal Large Language Models,MLLMs)的视频推理能力](https://huggingface.co/papers/2503.21776)中呈现的相关数据。 代码仓库:https://github.com/tulerfeng/Video-R1 视频数据集文件夹包含:CLEVRER、LLaVA-Video-178K、NeXT-QA、PerceptionTest、STAR。 图像数据集文件夹包含:Chart、General、Knowledge、Math、OCR、Spatial。 其中,Video-R1-COT-165k.json 用于监督微调(Supervised Fine-Tuning,SFT)冷启动阶段,Video-R1-260k.json 用于强化学习(Reinforcement Learning,RL)训练。 Video-R1-COT-165k 中的数据格式示例如下: { "problem_id": 2, "problem": "导弹上升阶段,屏幕上出现的俄文内容是什么?", "data_type": "视频", "problem_type": "选择题", "options": [ "A. YouTube订阅通知", "B. 军事指令", "C. 警告信息", "D. 天气更新" ], "process": "<think>请稍作思考。视频展示了导弹发射过程,导弹上升阶段屏幕上弹出了俄文弹窗。这会是什么呢?等等,这看起来有些眼熟,似乎曾在网络上见过。哦,我明白了!这是YouTube的弹窗,具体是显示“Вы подписаны”的通知,其译文为“您已订阅”,这是用户订阅YouTube频道时的常见提示。我们可以逐一排除:它既非军事指令、警告信息,也非天气更新,显然与YouTube相关。经过仔细推敲与验证,可以确定该弹窗确实为YouTube订阅通知。</think>", "solution": "<answer>A</answer>", "path": "./LLaVA-Video-178K/liwei_youtube_videos/videos/youtube_video_2024/ytb_7nRmsEw7nsE.mp4", "data_source": "LLaVA-Video-178K/30_60_s_youtube_v0_1" }
提供机构:
maas
创建时间:
2025-09-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作