资源简介:
---
license: cc-by-nc-sa-4.0
task_categories:
- visual-question-answering
- multiple-choice
language:
- en
tags:
- video
pretty_name: LVBench
size_categories:
- 100K<n<1M
---
# LVBench: An Extreme Long Video Understanding Benchmark
<font size=4><div align='center' > [[🍎 Project Page](https://lvbench.github.io/)] [[📖 arXiv Paper](https://arxiv.org/abs/2406.08035)] [[📊 Dataset](https://huggingface.co/datasets/THUDM/LVBench)][[🏆 Leaderboard](https://lvbench.github.io/#leaderboard)] </div></font>
<p align="center">
<img src="./docs/images/cover.png" width="96%" height="50%">
</p>
LVBench is a benchmark designed to evaluate and enhance the capabilities of multimodal models in understanding and
extracting information from long videos up to two hours in duration.
---
## 🔥 News
* **`2024.06.11`** 🌟 We released LVBench, a new benchmark for long video understanding!
## 👀 Introduce to LVBench
LVBench is a benchmark designed to evaluate the capabilities of models in understanding long videos. We collected
extensive long video data from public sources, annotated through a mix of manual effort and model assistance. Our
benchmark provides a robust foundation for testing models on extended temporal contexts, ensuring high-quality
assessment through meticulous human annotation and multi-stage quality control.
### Features
1. **Core Capabilities**: Six core capabilities for long video understanding, enabling the creation of complex and
challenging questions for comprehensive model evaluation.
2. **Diverse Data**: A diverse range of long video data, averaging five times longer than the longest existing datasets,
covering various categories.
3. **High-Quality Annotations**: Reliable benchmark with meticulous human annotation and multi-stage quality control
processes.
<img src="./docs/images/example.jpg" width="100%" height="50%">
## Dataset
### License
Our dataset is under the CC-BY-NC-SA-4.0 license.
LVBench is only used for academic research. Commercial use in any form is prohibited. We do not own the copyright of any raw video files.
If there is any infringement in LVBench, please contact shiyu.huang@aminer.cn or directly raise an issue, and we will remove it immediately.
### Download
Install video2dataset first:
```shell
pip install video2dataset
pip uninstall transformer-engine
```
Then you should download `video_info.meta.jsonl` from [Huggingface](https://huggingface.co/datasets/THUDM/LVBench) and
put it in the `data` directory.
Each entry in the `video_info.meta.jsonl` file has a key field corresponding to a YouTube video's ID. Users can download
the corresponding video using this ID. Alternatively, users can use the download script we provide, download.sh, for
downloading:
```shell
cd scripts
bash download.sh
```
After the execution, the video files will be stored in the `script/videos` directory.
## Install LVBench
```shell
pip install -e .
```
## Get Evaluation Results
(Note: if you want to try the evaluation quickly, you can use the `scripts/construct_random_answers.py` to prepare a
random answer file.)
```shell
cd scripts
python test_acc.py
```
## 📈 Results
- **Model Comparision:**
<p align="center">
<img src="./docs/images/leaderboard.png" width="96%" height="50%">
</p>
- **Benchmark Comparison:**
<p align="center">
<img src="./docs/images/compare.png" width="96%" height="50%">
</p>
- **Model vs Human:**
<p align="center">
<img src="./docs/images/human.png" width="96%" height="50%">
</p>
- **Answer Distribution:**
<p align="center">
<img src="./docs/images/distribution.png" width="96%" height="50%">
</p>
## License
The use of the dataset and the original videos is governed by the Creative Commons Attribution-NonCommercial-ShareAlike
4.0 International (CC BY-NC-SA 4.0) license, as detailed in the [LICENSE](./LICENSE).
If you believe that any content in this dataset infringes on your rights, please contact us at **_shiyu.huang@aminer.cn_** to request its
removal.
## Citation
If you find our work helpful for your research, please consider citing our work.
```bibtex
@misc{wang2024lvbench,
title={LVBench: An Extreme Long Video Understanding Benchmark},
author={Weihan Wang and Zehai He and Wenyi Hong and Yean Cheng and Xiaohan Zhang and Ji Qi and Shiyu Huang and Bin Xu and Yuxiao Dong and Ming Ding and Jie Tang},
year={2024},
eprint={2406.08035},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
---
license: CC-BY-NC-SA-4.0
task_categories:
- 视觉问答(visual-question-answering)
- 多项选择(multiple-choice)
language:
- 英语(en)
tags:
- 视频(video)
pretty_name: LVBench
size_categories:
- 10万<样本数<100万
---
# LVBench:一款极致长视频理解基准测试集
<font size=4><div align='center' > [[🍎 项目主页](https://lvbench.github.io/)] [[📖 arXiv论文](https://arxiv.org/abs/2406.08035)] [[📊 数据集](https://huggingface.co/datasets/THUDM/LVBench)][[🏆 排行榜](https://lvbench.github.io/#leaderboard)] </div></font>
<p align="center">
<img src="./docs/images/cover.png" width="96%" height="50%">
</p>
LVBench是一款专为评估与增强多模态模型(multimodal model)理解、提取最长达两小时长视频信息的能力而设计的基准测试集。
---
## 🔥 最新动态
* **`2024.06.11`** 🌟 我们正式发布LVBench,一款全新的长视频理解基准测试集!
## 👀 LVBench简介
LVBench是一款用于评估模型长视频理解能力的基准测试集。我们从公开渠道收集了海量长视频数据,并结合人工标注与模型辅助完成全流程标注。本基准针对扩展时序上下文场景构建了严谨的测试框架,通过精细化人工标注与多阶段质量管控流程,保障评估结果的高质量水准。
### 核心特性
1. **核心能力覆盖**:涵盖长视频理解的六大核心能力,可构建复杂且富有挑战性的测试问题,实现对模型的全方位评估。
2. **多样化数据规模**:涵盖多类别长视频数据,平均时长为现有主流数据集最长样本的五倍以上。
3. **高质量标注体系**:通过精细化人工标注与多阶段质量管控流程,打造可靠的基准测试集。
<img src="./docs/images/example.jpg" width="100%" height="50%">
## 数据集说明
### 许可证
本数据集遵循CC-BY-NC-SA-4.0许可证。
LVBench仅可用于学术研究,禁止任何形式的商业使用。我们不拥有任何原始视频文件的版权。
若LVBench中存在任何侵权内容,请联系shiyu.huang@aminer.cn或直接提交Issue,我们将立即移除相关内容。
### 下载方式
首先安装依赖包video2dataset:
shell
pip install video2dataset
pip uninstall transformer-engine
随后请从[HuggingFace](https://huggingface.co/datasets/THUDM/LVBench)下载`video_info.meta.jsonl`文件,并放置于`data`目录中。
`video_info.meta.jsonl`文件中的每个条目均包含一个对应YouTube视频ID的键字段,用户可通过该ID下载对应视频。此外,用户也可使用我们提供的下载脚本`download.sh`进行批量下载:
shell
cd scripts
bash download.sh
执行完成后,视频文件将存储于`scripts/videos`目录中。
### 安装LVBench
shell
pip install -e .
### 获取评估结果
(注:若希望快速体验评估流程,可使用`scripts/construct_random_answers.py`生成随机答案文件。)
shell
cd scripts
python test_acc.py
## 📈 实验结果
- **模型性能对比**:
<p align="center">
<img src="./docs/images/leaderboard.png" width="96%" height="50%">
</p>
- **基准测试集横向对比**:
<p align="center">
<img src="./docs/images/compare.png" width="96%" height="50%">
</p>
- **模型与人类表现对比**:
<p align="center">
<img src="./docs/images/human.png" width="96%" height="50%">
</p>
- **答案分布情况**:
<p align="center">
<img src="./docs/images/distribution.png" width="96%" height="50%">
</p>
## 许可证声明
本数据集及原始视频的使用需遵循知识共享署名-非商业性使用-相同方式共享4.0国际许可协议(CC BY-NC-SA 4.0),详细条款请参见[LICENSE](./LICENSE)文件。
若您认为本数据集中的任何内容侵犯了您的合法权益,请联系**_shiyu.huang@aminer.cn_**申请移除相关内容。
## 引用方式
若您的研究工作受益于本项目,请考虑引用我们的论文:
bibtex
@misc{wang2024lvbench,
title={LVBench: An Extreme Long Video Understanding Benchmark},
author={Weihan Wang and Zehai He and Wenyi Hong and Yean Cheng and Xiaohan Zhang and Ji Qi and Shiyu Huang and Bin Xu and Yuxiao Dong and Ming Ding and Jie Tang},
year={2024},
eprint={2406.08035},
archivePrefix={arXiv},
primaryClass={cs.CV}
}