five

LongGroundedThoughts-video-datagen

收藏
Hugging Face2026-03-23 更新2026-03-24 收录
下载链接:
https://huggingface.co/datasets/nvidia/LongGroundedThoughts-video-datagen
下载链接
链接失效反馈
官方服务:
资源简介:
LongGroundedThoughts 是一个用于视频理解的多模态数据集,专注于生成具有时间基础的多选题(MCQs)及其思维链推理过程。该数据集基于五个视频数据集(LLaVA-Video-178K、NExT-QA、CLEVRER、PE-Video 和 Ego4D)构建,包含从 LLaVA-Video-178K 中提取的 196,192 个多选题。数据集通过多阶段流水线生成:首先提取视频事件(语音转录、场景边界和运动分析),然后生成基于事件的多选题,最后合成简单和扩展的思维链推理。数据集支持多种问题类型,包括时间排序、语音视觉对齐、场景转换、因果关系、状态变化和视听问题。该数据集适用于视频理解、多选题生成和时序推理等任务,并提供了 SFT 和 DPO 训练数据格式,兼容 LLaMA-Factory。技术方面,数据集处理需要不同阶段的 GPU 资源,从 Whisper 模型的 10GB 到 DeepSeek-R1-32B 模型的 65GB 不等。数据集规模包括 196,192 个多选题和 49,265 个视频,目前已有 176,769 个多选题与视频文件匹配,并生成了 54,840 个思维链推理样本。

LongGroundedThoughts is a multimodal dataset for video understanding, focusing on generating time-grounded multiple-choice questions (MCQs) and their chain-of-thought reasoning processes. This dataset is constructed based on five video datasets: LLaVA-Video-178K, NExT-QA, CLEVRER, PE-Video, and Ego4D, and contains 196,192 multiple-choice questions extracted from LLaVA-Video-178K. The dataset is generated through a multi-stage pipeline: first, video events including speech transcripts, scene boundaries and motion analysis are extracted; then, event-based multiple-choice questions are generated; finally, simple and extended chain-of-thought reasoning is synthesized. The dataset supports various question types, such as temporal ordering, speech-visual alignment, scene transition, causal relationship, state change, and audio-visual questions. It is applicable to tasks including video understanding, multiple-choice question generation and temporal reasoning, and provides training data formats for Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO), which are compatible with LLaMA-Factory. Technically, dataset processing requires GPU resources of different capacities, ranging from 10GB for the Whisper model to 65GB for the DeepSeek-R1-32B model. The dataset has a scale of 196,192 multiple-choice questions and 49,265 videos. Currently, 176,769 multiple-choice questions have been matched with video files, and 54,840 chain-of-thought reasoning samples have been generated.
提供机构:
NVIDIA
创建时间:
2026-03-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作