LVBench

Name: LVBench
Creator: maas
Published: 2026-05-16 23:15:06
License: 暂无描述

魔搭社区2026-05-16 更新2024-08-31 收录

下载链接：

https://modelscope.cn/datasets/AI-ModelScope/LVBench

下载链接

链接失效反馈

官方服务：

资源简介：

# LVBench: An Extreme Long Video Understanding Benchmark <div align='center' > [[🍎 Project Page](https://lvbench.github.io/)] [[📖 arXiv Paper](https://arxiv.org/abs/2406.08035)] [[📊 Dataset](https://huggingface.co/datasets/THUDM/LVBench)][[🏆 Leaderboard](https://lvbench.github.io/#leaderboard)] </div> <img src="./docs/images/cover.png" width="96%" height="50%"> LVBench is a benchmark designed to evaluate and enhance the capabilities of multimodal models in understanding and extracting information from long videos up to two hours in duration. modelscope provides the videos in `videos/` folder. Note that some videos are not available due to copyright issues. > ![NOTE] > - 28CIeC8cZks: Video unavailable. This video is no longer available because the YouTube account associated with this video has been terminated. > - idZkam9zqAs: Video unavailable. This video is no longer available due to a copyright claim by SME > - gXnhqF0TqqI: Video unavailable. This video is no longer available due to a copyright claim by Fabulous Films Ltd > - QgWRyDV9Ozs: Video unavailable. This video has been removed by the uploader --- ## 🔥 News * **`2024.06.11`** 🌟 We released LVBench, a new benchmark for long video understanding! ## 👀 Introduce to LVBench LVBench is a benchmark designed to evaluate the capabilities of models in understanding long videos. We collected extensive long video data from public sources, annotated through a mix of manual effort and model assistance. Our benchmark provides a robust foundation for testing models on extended temporal contexts, ensuring high-quality assessment through meticulous human annotation and multi-stage quality control. ### Features 1. **Core Capabilities**: Six core capabilities for long video understanding, enabling the creation of complex and challenging questions for comprehensive model evaluation. 2. **Diverse Data**: A diverse range of long video data, averaging five times longer than the longest existing datasets, covering various categories. 3. **High-Quality Annotations**: Reliable benchmark with meticulous human annotation and multi-stage quality control processes. <img src="./docs/images/example.jpg" width="100%" height="50%"> ## Dataset ### License Our dataset is under the CC-BY-NC-SA-4.0 license. LVBench is only used for academic research. Commercial use in any form is prohibited. We do not own the copyright of any raw video files. If there is any infringement in LVBench, please contact shiyu.huang@aminer.cn or directly raise an issue, and we will remove it immediately. ### Download Install video2dataset first: ```shell pip install video2dataset pip uninstall transformer-engine ``` Then you should download `video_info.meta.jsonl` from [Huggingface](https://huggingface.co/datasets/THUDM/LVBench) and put it in the `data` directory. Each entry in the `video_info.meta.jsonl` file has a key field corresponding to a YouTube video's ID. Users can download the corresponding video using this ID. Alternatively, users can use the download script we provide, download.sh, for downloading: ```shell cd scripts bash download.sh ``` After the execution, the video files will be stored in the `script/videos` directory. ## Install LVBench ```shell pip install -e . ``` ## Get Evaluation Results (Note: if you want to try the evaluation quickly, you can use the `scripts/construct_random_answers.py` to prepare a random answer file.) ```shell cd scripts python test_acc.py ``` ## 📈 Results - **Model Comparision:** <img src="./docs/images/leaderboard.png" width="96%" height="50%"> - **Benchmark Comparison:** <img src="./docs/images/compare.png" width="96%" height="50%"> - **Model vs Human:** <img src="./docs/images/human.png" width="96%" height="50%"> - **Answer Distribution:** <img src="./docs/images/distribution.png" width="96%" height="50%"> ## License The use of the dataset and the original videos is governed by the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license, as detailed in the [LICENSE](./LICENSE). If you believe that any content in this dataset infringes on your rights, please contact us at **_shiyu.huang@aminer.cn_** to request its removal. ## Citation If you find our work helpful for your research, please consider citing our work. ```bibtex @misc{wang2024lvbench, title={LVBench: An Extreme Long Video Understanding Benchmark}, author={Weihan Wang and Zehai He and Wenyi Hong and Yean Cheng and Xiaohan Zhang and Ji Qi and Shiyu Huang and Bin Xu and Yuxiao Dong and Ming Ding and Jie Tang}, year={2024}, eprint={2406.08035}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```

# LVBench: 极致长视频理解基准测试 <div align='center' > [[🍎 项目主页](https://lvbench.github.io/)] [[📖 arXiv论文](https://arxiv.org/abs/2406.08035)] [[📊 数据集](https://huggingface.co/datasets/THUDM/LVBench)][[🏆 排行榜](https://lvbench.github.io/#leaderboard)] </div> <img src="./docs/images/cover.png" width="96%" height="50%"> LVBench是一款专为评估和提升多模态模型理解时长可达两小时的长视频并从中提取信息的能力而设计的基准测试。 --- ## 🔥 最新动态 * **`2024.06.11`** 🌟 我们正式发布LVBench，一款全新的长视频理解基准测试！ ## 👀 LVBench 简介 LVBench是一款用于评估模型长视频理解能力的基准测试。我们从公开数据源采集了海量长视频数据，并采用人工标注结合模型辅助的方式完成注释流程。本基准测试为测试模型在扩展时序上下文下的性能提供了坚实的评估基础，并通过严谨的人工标注与多阶段质量控制流程保障了评估的高质量性。 ### 核心特性 1. **核心能力**：覆盖长视频理解领域的六大核心能力，可构建复杂且富有挑战性的问题，实现对模型的全面评估。 2. **多样化数据**：涵盖多类别长视频数据，平均时长为现有同类最长数据集的五倍，规模远超现有基准。 3. **高质量标注**：通过严谨的人工标注与多阶段质量控制流程，打造可靠的基准测试数据集。 <img src="./docs/images/example.jpg" width="100%" height="50%"> ## 数据集 ### 许可证本数据集采用CC-BY-NC-SA-4.0许可证发布。 LVBench仅可用于学术研究，禁止任何形式的商业使用。我们不拥有任何原始视频文件的版权。若LVBench中存在任何侵权内容，请联系shiyu.huang@aminer.cn或直接提交Issue，我们将立即移除相关内容。 ### 下载首先需安装video2dataset库： shell pip install video2dataset pip uninstall transformer-engine 随后请从[Huggingface](https://huggingface.co/datasets/THUDM/LVBench)下载`video_info.meta.jsonl`文件，并将其放置于`data`目录中。 `video_info.meta.jsonl`文件中的每一条目均包含一个对应YouTube视频ID的关键字段，用户可通过该ID下载对应的视频文件。此外，用户也可使用我们提供的下载脚本`download.sh`进行批量下载： shell cd scripts bash download.sh 执行完成后，视频文件将存储于`script/videos`目录中。 ## 安装LVBench shell pip install -e . ## 获取评估结果（注：若您希望快速体验评估流程，可通过`scripts/construct_random_answers.py`生成随机答案文件。） shell cd scripts python test_acc.py ## 📈 实验结果 - **模型对比：** <img src="./docs/images/leaderboard.png" width="96%" height="50%"> - **基准测试对比：** <img src="./docs/images/compare.png" width="96%" height="50%"> - **模型与人类表现对比：** <img src="./docs/images/human.png" width="96%" height="50%"> - **答案分布：** <img src="./docs/images/distribution.png" width="96%" height="50%"> ## 许可证本数据集与原始视频的使用需遵循Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International（CC BY-NC-SA 4.0）许可证，详细信息请参见[LICENSE](./LICENSE)文件。若您认为本数据集中的任何内容侵犯了您的权益，请联系**_shiyu.huang@aminer.cn_**申请移除相关内容。 ## 引用若您的研究工作从本项目中获益，请考虑引用我们的成果。 bibtex @misc{wang2024lvbench, title={LVBench: An Extreme Long Video Understanding Benchmark}, author={Weihan Wang and Zehai He and Wenyi Hong and Yean Cheng and Xiaohan Zhang and Ji Qi and Shiyu Huang and Bin Xu and Yuxiao Dong and Ming Ding and Jie Tang}, year={2024}, eprint={2406.08035}, archivePrefix={arXiv}, primaryClass={cs.CV} }

提供机构：

maas

创建时间：

2024-11-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集