InternVideo2_Vid_Text

Name: InternVideo2_Vid_Text
Creator: maas
Published: 2025-12-04 16:19:32
License: 暂无描述

魔搭社区2025-12-04 更新2024-12-28 收录

下载链接：

https://modelscope.cn/datasets/OpenGVLab/InternVideo2_Vid_Text

下载链接

链接失效反馈

官方服务：

资源简介：

# InternVideo2-stage2-vid-text Dataset ## Dataset Description - **Homepage:** [InternVideo2](https://github.com/OpenGVLab/InternVideo2) - **Repository:** [OpenGVLab](https://github.com/OpenGVLab/InternVideo/tree/main/InternVideo2) - **Paper:** [2403.15377](https://arxiv.org/pdf/2403.15377) - **Point of Contact:** mailto:[InternVideo](gvx-sh@pjlab.org.cn) ## About InternVideo2-avs dataset This is the 61M audio-visual-speech annotated data collected during the second phase of training in InternVideo2, with the video sources primarily coming from [YT-Temporal-180M](https://rowanzellers.com/merlot/#data). We have provided the same YoutubeID along with the markings for the start and end frames. Videos cover a wide range of topics and scenarios to ensure the diversity and representativeness of the data. It aims to facilitate research and development in the field of video and text understanding and interaction. The samples are provided in jsonlines file. Columns include the videoID, start and end frames, speech, generated audio caption, generated visual caption and summarized audio-visual-speech caption. ## How to Use ``` from datasets import load_dataset dataset = load_dataset("OpenGVLab/InternVideo2_Vid_Text") ``` ## Citation If you find this work useful for your research, please consider citing InternVid. Your acknowledgement would greatly help us in continuing to contribute resources to the research community. ``` @article{wang2024internvideo2, title={Internvideo2: Scaling video foundation models for multimodal video understanding}, author={Wang, Yi and Li, Kunchang and Li, Xinhao and Yu, Jiashuo and He, Yinan and Chen, Guo and Pei, Baoqi and Zheng, Rongkun and Xu, Jilan and Wang, Zun and others}, journal={arXiv preprint arXiv:2403.15377}, year={2024} } ```

# InternVideo2-stage2-vid-text 数据集 ## 数据集说明 - **主页：** [InternVideo2](https://github.com/OpenGVLab/InternVideo2) - **代码仓库：** [OpenGVLab](https://github.com/OpenGVLab/InternVideo/tree/main/InternVideo2) - **相关论文：** [2403.15377](https://arxiv.org/pdf/2403.15377) - **联系方式：** mailto:[InternVideo](gvx-sh@pjlab.org.cn) ## 关于InternVideo2-stage2-vid-text数据集本数据集为InternVideo2第二阶段训练阶段采集的61M条带音视频语音标注的语料，视频源主要来自[YT-Temporal-180M](https://rowanzellers.com/merlot/#data)。我们为每条数据提供了对应的YoutubeID以及起止帧标记。视频涵盖多元主题与场景，以保障数据集的多样性与代表性。本数据集旨在推动视频与文本理解及交互领域的研究与开发工作。数据集样本以jsonlines格式存储，包含以下字段：视频ID（videoID）、起止帧、语音内容、生成的音频字幕、生成的视觉字幕以及整合后的音视频语音摘要字幕。 ## 使用方法 from datasets import load_dataset dataset = load_dataset("OpenGVLab/InternVideo2_Vid_Text") ## 引用说明若本数据集对您的研究有所助益，请引用InternVid相关成果。您的认可将极大助力我们持续为学术社区贡献相关资源。 @article{wang2024internvideo2, title={Internvideo2: Scaling video foundation models for multimodal video understanding}, author={Wang, Yi and Li, Kunchang and Li, Xinhao and Yu, Jiashuo and He, Yinan and Chen, Guo and Pei, Baoqi and Zheng, Rongkun and Xu, Jilan and Wang, Zun and others}, journal={arXiv preprint arXiv:2403.15377}, year={2024} }

提供机构：

maas

创建时间：

2024-12-26

5,000+

优质数据集

54 个

任务类型

进入经典数据集