five

PolyU-ChenLab/ET-Instruct-164K

收藏
Hugging Face2024-09-27 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/PolyU-ChenLab/ET-Instruct-164K
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-sa-4.0 --- # E.T. Instruct 164K [arXiv](https://arxiv.org/abs/2409.18111) | [Project Page](https://polyu-chenlab.github.io/etbench) | [GitHub](https://github.com/PolyU-ChenLab/ETBench) E.T. Instruct 164K is a large-scale instruction-tuning dataset tailored for fine-grained event-level and time-sensitive video understanding. It contains 101K meticulously collected videos under diverse domains and 9 event-level understanding tasks with well-designed instruction-response pairs. The average video length is around 146 seconds. ## 📦 Download Dataset You may download the dataset using the following command. ``` git lfs install git clone git@hf.co:datasets/PolyU-ChenLab/ET-Instruct-164K ``` Then, enter the directory and extract the files in `videos` folder by running: ``` cd ET-Instruct-164K for path in videos/*.tar.gz; do tar -xvf $path -C videos; done ``` All the videos have been processed to `3 FPS` and `224 pixels shortest side`. The audio has been removed as well. ## 🚀 Getting Started We provide two types of annotations (`txt` and `vid`), whose difference is illustrated below. - `et_instruct_164k_txt.json` - for models representing timestamps in pure text, e.g., '2.5 - 4.8 seconds' - `et_instruct_164k_vid.json` - for models using special tokens for timestamps, e.g., \<vid\> token in E.T. Chat Each JSON file contains a list of dicts with the following entries. ```python { "task": "slc", # task "source": "how_to_step", # source dataset "video": "how_to_step/PJi8ZEHAFcI.mp4", # path to video "duration": 200.767, # video duration (seconds) "src": [12, 18], # [optional] timestamps (seconds) in model inputs "tgt": [36, 44, 49, 57], # [optional] timestamps (seconds) in model outputs "conversations": [ # conversations { "from": "human", "value": "<image>\n..." }, { "from": "gpt", "value": "..." } ] } ``` In `vid` style annotations, all the timestamps in `conversations` have been replaced with `<vid>` and their original values can be found in `src` (for human messages) and `tgt` (for gpt messages). ## 📖 Citation Please kindly cite our paper if you find this project helpful. ``` @inproceedings{liu2024etbench, title={E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding}, author={Liu, Ye and Ma, Zongyang and Qi, Zhongang and Wu, Yang and Chen, Chang Wen and Shan, Ying}, booktitle={Neural Information Processing Systems (NeurIPS)}, year={2024} } ```
提供机构:
PolyU-ChenLab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作