byminji/VideoChat2-IT-clean

Name: byminji/VideoChat2-IT-clean
Creator: byminji
Published: 2026-03-03 05:47:01
License: 暂无描述

Hugging Face2026-03-03 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/byminji/VideoChat2-IT-clean

下载链接

链接失效反馈

官方服务：

资源简介：

--- task_categories: - video-text-to-text tags: - video - instruction-tuning - video-question-answering language: - en --- <h3 align="center"><a href="https://arxiv.org/abs/2510.13251">[ICLR 2026] Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs</a></h3> <div align="center"> <img width="1000" alt="teaser" src="https://cdn-uploads.huggingface.co/production/uploads/66e345c9596fcff3e4b22e5a/z8qfSvZXfIHb0IdSWCLNA.jpeg"> </div> <h5 align="center"> If you like our project, please give us a star ⭐ on <a href="https://github.com/byminji/map-the-flow">Github</a> for the latest update. </h5> ## Introduction This is **VideoChat2-IT-clean**, a cleaned version of the [VideoChat2-IT](https://huggingface.co/datasets/OpenGVLab/VideoChat2-IT) video instruction tuning dataset, released alongside our ICLR 2026 paper [Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs](https://arxiv.org/abs/2510.13251). The original VideoChat2-IT dataset contains annotation files pointing to videos that are no longer available. We filtered out samples with missing videos and provide the cleaned annotation JSONs here. Cleaning was performed using [scripts/data_preprocess/clean_data_anno.py](https://github.com/byminji/map-the-flow/blob/main/scripts/data_preprocess/clean_data_anno.py). ## Data Spec Total valid samples: **874,869** <details> <summary>Per-dataset breakdown</summary> | Video source | Task | Dataset | Total | Valid | Invalid | |:---:|:---:|:---:|---:|---:|---:| | TextVR | caption | textvr | 39,648 | 39,648 | 0 | | YouCook2 | caption | youcook2 | 8,700 | 8,700 | 0 | | Kinetics | classification | k710 | 40,000 | 38,977 | 1,023 | | SSv2 | classification | ssv2 | 40,000 | 40,000 | 0 | | InternVid | conversation | videochat2 | 9,584 | 9,584 | 0 | | ActivityNet | conversation | videochatgpt | 13,303 | 13,303 | 0 | | NExT-QA | reasoning | next_qa | 34,132 | 34,132 | 0 | | CLEVRER | reasoning | clevrer_qa | 40,000 | 40,000 | 0 | | CLEVRER | reasoning | clevrer_mc | 40,000 | 40,000 | 0 | | EgoQA | vqa | ego_qa | 7,813 | 7,797 | 16 | | TGIF | vqa | tgif_frame_qa | 39,149 | 39,149 | 0 | | TGIF | vqa | tgif_transition_qa | 52,696 | 52,696 | 0 | | WebVid | caption | webvid | 400,000 | 399,740 | 260 | | WebVid | caption | videochat | 6,889 | 6,889 | 0 | | WebVid | conversation | videochat1 | 4,300 | 4,300 | 0 | | WebVid | vqa | webvid_qa | 100,000 | 99,954 | 46 | </details> ## Usage Download the annotation JSONs from this repository and set the paths in your training config. For raw video download instructions, refer to [DATA.md](https://github.com/byminji/map-the-flow/blob/main/DATA.md). We use this annotation to train our models: [byminji/LLaVA-NeXT-7B-Video-FT](https://huggingface.co/byminji/LLaVA-NeXT-7B-Video-FT), [byminji/LLaVA-NeXT-13B-Video-FT](https://huggingface.co/byminji/LLaVA-NeXT-13B-Video-FT), and [byminji/Mini-InternVL-4B-Video-FT](https://huggingface.co/byminji/Mini-InternVL-4B-Video-FT). ## Citation If you find our paper useful in your research, please consider citing: ```bibtex @inproceedings{kim2026map, author = {Kim, Minji and Kim, Taekyung and Han, Bohyung}, title = {Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs}, booktitle = {International Conference on Learning Representations (ICLR)}, year = {2026}, } @article{kim2025map, author = {Kim, Minji and Kim, Taekyung and Han, Bohyung}, title = {Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs}, journal = {arXiv preprint arXiv:2510.13251}, year = {2025}, } ```

提供机构：

byminji

5,000+

优质数据集

54 个

任务类型

进入经典数据集