PolyU-ChenLab/ET-Instruct-164K

Name: PolyU-ChenLab/ET-Instruct-164K
Creator: PolyU-ChenLab
Published: 2024-09-27 19:21:18
License: 暂无描述

Hugging Face2024-09-27 更新2025-11-01 收录

下载链接：

https://hf-mirror.com/datasets/PolyU-ChenLab/ET-Instruct-164K

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-sa-4.0 --- # E.T. Instruct 164K [arXiv](https://arxiv.org/abs/2409.18111) | [Project Page](https://polyu-chenlab.github.io/etbench) | [GitHub](https://github.com/PolyU-ChenLab/ETBench) E.T. Instruct 164K is a large-scale instruction-tuning dataset tailored for fine-grained event-level and time-sensitive video understanding. It contains 101K meticulously collected videos under diverse domains and 9 event-level understanding tasks with well-designed instruction-response pairs. The average video length is around 146 seconds. ## 📦 Download Dataset You may download the dataset using the following command. ``` git lfs install git clone git@hf.co:datasets/PolyU-ChenLab/ET-Instruct-164K ``` Then, enter the directory and extract the files in `videos` folder by running: ``` cd ET-Instruct-164K for path in videos/*.tar.gz; do tar -xvf $path -C videos; done ``` All the videos have been processed to `3 FPS` and `224 pixels shortest side`. The audio has been removed as well. ## 🚀 Getting Started We provide two types of annotations (`txt` and `vid`), whose difference is illustrated below. - `et_instruct_164k_txt.json` - for models representing timestamps in pure text, e.g., '2.5 - 4.8 seconds' - `et_instruct_164k_vid.json` - for models using special tokens for timestamps, e.g., \<vid\> token in E.T. Chat Each JSON file contains a list of dicts with the following entries. ```python { "task": "slc", # task "source": "how_to_step", # source dataset "video": "how_to_step/PJi8ZEHAFcI.mp4", # path to video "duration": 200.767, # video duration (seconds) "src": [12, 18], # [optional] timestamps (seconds) in model inputs "tgt": [36, 44, 49, 57], # [optional] timestamps (seconds) in model outputs "conversations": [ # conversations { "from": "human", "value": "<image>\n..." }, { "from": "gpt", "value": "..." } ] } ``` In `vid` style annotations, all the timestamps in `conversations` have been replaced with `<vid>` and their original values can be found in `src` (for human messages) and `tgt` (for gpt messages). ## 📖 Citation Please kindly cite our paper if you find this project helpful. ``` @inproceedings{liu2024etbench, title={E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding}, author={Liu, Ye and Ma, Zongyang and Qi, Zhongang and Wu, Yang and Chen, Chang Wen and Shan, Ying}, booktitle={Neural Information Processing Systems (NeurIPS)}, year={2024} } ```

提供机构：

PolyU-ChenLab

5,000+

优质数据集

54 个

任务类型

进入经典数据集