THUDM/LVBench

Name: THUDM/LVBench
Creator: THUDM
Published: 2024-06-13 01:24:08
License: 暂无描述

Hugging Face2024-06-13 更新2024-06-15 收录

下载链接：

https://hf-mirror.com/datasets/THUDM/LVBench

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-sa-4.0 task_categories: - visual-question-answering - multiple-choice language: - en tags: - video pretty_name: LVBench size_categories: - 100K<n<1M --- # LVBench: An Extreme Long Video Understanding Benchmark <div align='center' > [[🍎 Project Page](https://lvbench.github.io/)] [[📖 arXiv Paper](https://arxiv.org/abs/2406.08035)] [[📊 Dataset](https://huggingface.co/datasets/THUDM/LVBench)][[🏆 Leaderboard](https://lvbench.github.io/#leaderboard)] </div> <img src="./docs/images/cover.png" width="96%" height="50%"> LVBench is a benchmark designed to evaluate and enhance the capabilities of multimodal models in understanding and extracting information from long videos up to two hours in duration. --- ## 🔥 News * **`2024.06.11`** 🌟 We released LVBench, a new benchmark for long video understanding! ## 👀 Introduce to LVBench LVBench is a benchmark designed to evaluate the capabilities of models in understanding long videos. We collected extensive long video data from public sources, annotated through a mix of manual effort and model assistance. Our benchmark provides a robust foundation for testing models on extended temporal contexts, ensuring high-quality assessment through meticulous human annotation and multi-stage quality control. ### Features 1. **Core Capabilities**: Six core capabilities for long video understanding, enabling the creation of complex and challenging questions for comprehensive model evaluation. 2. **Diverse Data**: A diverse range of long video data, averaging five times longer than the longest existing datasets, covering various categories. 3. **High-Quality Annotations**: Reliable benchmark with meticulous human annotation and multi-stage quality control processes. <img src="./docs/images/example.jpg" width="100%" height="50%"> ## Dataset ### License Our dataset is under the CC-BY-NC-SA-4.0 license. LVBench is only used for academic research. Commercial use in any form is prohibited. We do not own the copyright of any raw video files. If there is any infringement in LVBench, please contact shiyu.huang@aminer.cn or directly raise an issue, and we will remove it immediately. ### Download Install video2dataset first: ```shell pip install video2dataset pip uninstall transformer-engine ``` Then you should download `video_info.meta.jsonl` from [Huggingface](https://huggingface.co/datasets/THUDM/LVBench) and put it in the `data` directory. Each entry in the `video_info.meta.jsonl` file has a key field corresponding to a YouTube video's ID. Users can download the corresponding video using this ID. Alternatively, users can use the download script we provide, download.sh, for downloading: ```shell cd scripts bash download.sh ``` After the execution, the video files will be stored in the `script/videos` directory. ## Install LVBench ```shell pip install -e . ``` ## Get Evaluation Results (Note: if you want to try the evaluation quickly, you can use the `scripts/construct_random_answers.py` to prepare a random answer file.) ```shell cd scripts python test_acc.py ``` ## 📈 Results - **Model Comparision:** <img src="./docs/images/leaderboard.png" width="96%" height="50%"> - **Benchmark Comparison:** <img src="./docs/images/compare.png" width="96%" height="50%"> - **Model vs Human:** <img src="./docs/images/human.png" width="96%" height="50%"> - **Answer Distribution:** <img src="./docs/images/distribution.png" width="96%" height="50%"> ## License The use of the dataset and the original videos is governed by the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license, as detailed in the [LICENSE](./LICENSE). If you believe that any content in this dataset infringes on your rights, please contact us at **_shiyu.huang@aminer.cn_** to request its removal. ## Citation If you find our work helpful for your research, please consider citing our work. ```bibtex @misc{wang2024lvbench, title={LVBench: An Extreme Long Video Understanding Benchmark}, author={Weihan Wang and Zehai He and Wenyi Hong and Yean Cheng and Xiaohan Zhang and Ji Qi and Shiyu Huang and Bin Xu and Yuxiao Dong and Ming Ding and Jie Tang}, year={2024}, eprint={2406.08035}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```

--- license: CC-BY-NC-SA-4.0 task_categories: - 视觉问答（visual-question-answering） - 多项选择（multiple-choice） language: - 英语（en） tags: - 视频（video） pretty_name: LVBench size_categories: - 10万<样本数<100万 --- # LVBench：一款极致长视频理解基准测试集 <div align='center' > [[🍎 项目主页](https://lvbench.github.io/)] [[📖 arXiv论文](https://arxiv.org/abs/2406.08035)] [[📊 数据集](https://huggingface.co/datasets/THUDM/LVBench)][[🏆 排行榜](https://lvbench.github.io/#leaderboard)] </div> <img src="./docs/images/cover.png" width="96%" height="50%"> LVBench是一款专为评估与增强多模态模型（multimodal model）理解、提取最长达两小时长视频信息的能力而设计的基准测试集。 --- ## 🔥 最新动态 * **`2024.06.11`** 🌟 我们正式发布LVBench，一款全新的长视频理解基准测试集！ ## 👀 LVBench简介 LVBench是一款用于评估模型长视频理解能力的基准测试集。我们从公开渠道收集了海量长视频数据，并结合人工标注与模型辅助完成全流程标注。本基准针对扩展时序上下文场景构建了严谨的测试框架，通过精细化人工标注与多阶段质量管控流程，保障评估结果的高质量水准。 ### 核心特性 1. **核心能力覆盖**：涵盖长视频理解的六大核心能力，可构建复杂且富有挑战性的测试问题，实现对模型的全方位评估。 2. **多样化数据规模**：涵盖多类别长视频数据，平均时长为现有主流数据集最长样本的五倍以上。 3. **高质量标注体系**：通过精细化人工标注与多阶段质量管控流程，打造可靠的基准测试集。 <img src="./docs/images/example.jpg" width="100%" height="50%"> ## 数据集说明 ### 许可证本数据集遵循CC-BY-NC-SA-4.0许可证。 LVBench仅可用于学术研究，禁止任何形式的商业使用。我们不拥有任何原始视频文件的版权。若LVBench中存在任何侵权内容，请联系shiyu.huang@aminer.cn或直接提交Issue，我们将立即移除相关内容。 ### 下载方式首先安装依赖包video2dataset： shell pip install video2dataset pip uninstall transformer-engine 随后请从[HuggingFace](https://huggingface.co/datasets/THUDM/LVBench)下载`video_info.meta.jsonl`文件，并放置于`data`目录中。 `video_info.meta.jsonl`文件中的每个条目均包含一个对应YouTube视频ID的键字段，用户可通过该ID下载对应视频。此外，用户也可使用我们提供的下载脚本`download.sh`进行批量下载： shell cd scripts bash download.sh 执行完成后，视频文件将存储于`scripts/videos`目录中。 ### 安装LVBench shell pip install -e . ### 获取评估结果（注：若希望快速体验评估流程，可使用`scripts/construct_random_answers.py`生成随机答案文件。） shell cd scripts python test_acc.py ## 📈 实验结果 - **模型性能对比**： <img src="./docs/images/leaderboard.png" width="96%" height="50%"> - **基准测试集横向对比**： <img src="./docs/images/compare.png" width="96%" height="50%"> - **模型与人类表现对比**： <img src="./docs/images/human.png" width="96%" height="50%"> - **答案分布情况**： <img src="./docs/images/distribution.png" width="96%" height="50%"> ## 许可证声明本数据集及原始视频的使用需遵循知识共享署名-非商业性使用-相同方式共享4.0国际许可协议（CC BY-NC-SA 4.0），详细条款请参见[LICENSE](./LICENSE)文件。若您认为本数据集中的任何内容侵犯了您的合法权益，请联系**_shiyu.huang@aminer.cn_**申请移除相关内容。 ## 引用方式若您的研究工作受益于本项目，请考虑引用我们的论文： bibtex @misc{wang2024lvbench, title={LVBench: An Extreme Long Video Understanding Benchmark}, author={Weihan Wang and Zehai He and Wenyi Hong and Yean Cheng and Xiaohan Zhang and Ji Qi and Shiyu Huang and Bin Xu and Yuxiao Dong and Ming Ding and Jie Tang}, year={2024}, eprint={2406.08035}, archivePrefix={arXiv}, primaryClass={cs.CV} }

提供机构：

THUDM

原始信息汇总

LVBench: An Extreme Long Video Understanding Benchmark

数据集概述

LVBench是一个用于评估和增强多模态模型在理解和提取长达两小时视频信息能力的基准测试。该数据集通过手动努力和模型辅助相结合的方式，从公共源收集了大量的长视频数据，并进行了细致的人工标注和多阶段质量控制。

特点

核心能力：包含六个核心能力，用于创建复杂和具有挑战性的问题，以全面评估模型。
多样数据：涵盖多种类别的长视频数据，平均长度是现有最长数据集的五倍。
高质量标注：通过细致的人工标注和多阶段质量控制流程，确保基准的可靠性。

数据集信息

许可证

数据集采用CC-BY-NC-SA-4.0许可证。仅用于学术研究，禁止任何形式的商业使用。

下载

安装video2dataset： shell pip install video2dataset pip uninstall transformer-engine
从Huggingface下载video_info.meta.jsonl文件并放入data目录。
使用提供的下载脚本download.sh下载视频文件： shell cd scripts bash download.sh

安装

shell pip install -e .

评估结果

使用scripts/test_acc.py获取评估结果。

引用

bibtex @misc{wang2024lvbench, title={LVBench: An Extreme Long Video Understanding Benchmark}, author={Weihan Wang and Zehai He and Wenyi Hong and Yean Cheng and Xiaohan Zhang and Ji Qi and Shiyu Huang and Bin Xu and Yuxiao Dong and Ming Ding and Jie Tang}, year={2024}, eprint={2406.08035}, archivePrefix={arXiv}, primaryClass={cs.CV} }

搜集汇总

数据集介绍

构建方式

在长视频理解领域，LVBench数据集的构建体现了严谨的学术追求。其视频素材源自公开资源，通过人工标注与模型辅助相结合的混合模式进行注释。构建过程强调多阶段质量控制，确保了标注的高可靠性，为评估模型在长达两小时的极端时序上下文中的理解能力奠定了坚实基础。

特点

该数据集的核心特征在于其极致的时序长度与多维度的评估体系。视频平均长度达到现有最长数据集的五倍，覆盖了多样化的内容类别。数据集设计了六项核心理解能力，能够生成复杂而富有挑战性的问题，从而对多模态模型进行全方位、深层次的评估。

使用方法

为使用该数据集，研究者需先行安装指定的视频处理工具。数据集本身以元数据文件形式提供，其中包含与YouTube视频ID对应的关键字段。用户可通过官方提供的下载脚本获取原始视频文件，随后利用配套的评估代码对模型性能进行测试，整个过程为长视频理解研究提供了标准化的评估流程。

背景与挑战

背景概述

随着多模态人工智能技术的迅猛发展，视频理解已成为计算机视觉与自然语言处理交叉领域的前沿课题。然而，现有视频数据集多聚焦于短视频片段，难以支撑对长时视频内容的深度语义解析。在此背景下，清华大学知识工程实验室（KEG）于2024年6月推出了LVBench数据集，旨在系统评估模型对长达两小时视频的理解能力。该数据集通过整合公开长视频资源，并融合人工标注与模型辅助标注，构建了涵盖六项核心能力的评测基准，为长视频理解研究提供了关键的数据基础与评估框架。

当前挑战

LVBench致力于应对长视频理解领域的核心挑战，即模型如何从海量时序信息中提取并整合关键语义，完成复杂推理与问答任务。具体而言，该领域面临视频时长跨度大、时序依赖性强、多模态信息对齐困难等难题。在数据集构建过程中，研究团队亦需克服标注质量控制的严峻考验，包括确保长视频标注的连贯性与准确性，以及通过多阶段质检流程维持数据的高可靠性。此外，数据版权与合规性问题亦为构建过程增添了复杂性，需谨慎处理原始视频的授权与使用边界。

常用场景

经典使用场景

在长视频理解领域，LVBench作为一项前沿基准测试，其经典使用场景聚焦于评估多模态模型对长达两小时视频内容的深度解析能力。通过涵盖叙事连贯性分析、时序事件追踪、角色行为推理等六项核心能力，该数据集为模型提供了从复杂视觉序列中提取语义信息的标准化测试平台，尤其适用于检验模型在扩展时间上下文中的信息整合与推理效能。

解决学术问题

LVBench的构建旨在解决长视频理解研究中长期存在的关键学术问题，包括模型对长时序依赖的建模不足、跨模态信息融合的效率瓶颈以及缺乏标准化评估体系等挑战。该数据集通过提供高质量人工标注与多阶段质量控制，为学术界建立了可靠的性能衡量标准，推动了视频理解模型从片段级分析向整体叙事理解的范式转变，显著提升了该领域研究的严谨性与可比性。

衍生相关工作

围绕LVBench衍生的经典研究工作主要沿两个方向展开：一方面，研究者基于其构建的分层注意力机制与记忆增强网络，显著提升了模型对长视频的全局建模能力；另一方面，该数据集催生了多模态预训练范式的革新，如时空融合Transformer架构的演进，这些成果共同推动了VideoLLaMA、VideoChat等标杆性系统的性能突破，为后续长视频理解研究奠定了方法论基础。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集