VideoEval-Pro

Name: VideoEval-Pro
Creator: maas
Published: 2026-01-06 16:32:24
License: 暂无描述

魔搭社区2026-01-06 更新2025-05-17 收录

下载链接：

https://modelscope.cn/datasets/TIGER-Lab/VideoEval-Pro

下载链接

链接失效反馈

官方服务：

资源简介：

# VideoEval-Pro VideoEval-Pro is a robust and realistic long video understanding benchmark containing open-ended, short-answer QA problems. The dataset is constructed by reformatting questions from four existing long video understanding MCQ benchmarks: Video-MME, MLVU, LVBench, and LongVideoBench into free-form questions. The paper can be found [here](https://huggingface.co/papers/2505.14640). The evaluation code and scripts are available at: [TIGER-AI-Lab/VideoEval-Pro](https://github.com/TIGER-AI-Lab/VideoEval-Pro) ## Dataset Structure Each example in the dataset contains: - `video`: Name (path) of the video file - `question`: The question about the video content - `options`: Original options from the source benchmark - `answer`: The correct MCQ answer - `answer_text`: The correct free-form answer - `meta`: Additional metadata from the source benchmark - `source`: Source benchmark - `qa_subtype`: Question task subtype - `qa_type`: Question task type ## Evaluation Steps 1. **Download and Prepare Videos** ```bash # Navigate to videos directory cd videos # Merge all split tar.gz files into a single archive cat videos_part_*.tar.gz > videos_merged.tar.gz # Extract the merged archive tar -xzf videos_merged.tar.gz # [Optional] Clean up the split files and merged archive rm videos_part_*.tar.gz videos_merged.tar.gz # After extraction, you will get a directory containing all videos # The path to this directory will be used as --video_root in evaluation # For example: 'VideoEval-Pro/videos' ``` 2. **[Optional] Pre-extract Frames** To improve efficiency, you can pre-extract frames from videos. The extracted frames should be organized as follows: ``` frames_root/ ├── video_name_1/ # Directory name is thevideo name │ ├── 000001.jpg # Frame images │ ├── 000002.jpg │ └── ... ├── video_name_2/ │ ├── 000001.jpg │ ├── 000002.jpg │ └── ... └── ... ``` After frame extraction, the path to the frames will be used as `--frames_root`. Set `--using_frames True` when running the evaluation script. 3. **Setup Evaluation Environment** ```bash # Clone the repository from the GitHub repository git clone https://github.com/TIGER-AI-Lab/VideoEval-Pro cd VideoEval-Pro # Create conda environment from requirements.txt (there are different requirements files for different models) conda create -n videoevalpro --file requirements.txt conda activate videoevalpro ``` 4. **Run Evaluation** ```bash cd VideoEval-Pro # Set PYTHONPATH export PYTHONPATH=. # Run evaluation script with the following parameters: # --video_root: Path to video files folder # --frames_root: Path to video frames folder [For using_frames] # --output_path: Path to save output results # --using_frames: Whether to use pre-extracted frames # --model_path: Path to model # --device: Device to run inference on # --num_frames: Number of frames to sample from video # --max_retries: Maximum number of retries for failed inference # --num_threads: Number of threads for parallel processing python tools/*_chat.py \ --video_root <path_to_videos> \ --frames_root <path_to_frames> \ --output_path <path_to_save_results> \ --using_frames <True/False> \ --model_path <model_name_or_path> \ --device <device> \ --num_frames <number_of_frames> \ --max_retries <max_retries> \ --num_threads <num_threads> E.g.: python tools/qwen_chat.py \ --video_root ./videos \ --frames_root ./frames \ --output_path ./results/qwen_results.jsonl \ --using_frames False \ --model_path Qwen/Qwen2-VL-7B-Instruct \ --device cuda \ --num_frames 32 \ --max_retries 10 \ --num_threads 1 ``` 5. **Judge the results** ```bash cd VideoEval-Pro # Set PYTHONPATH export PYTHONPATH=. # Run judge script *gpt4o_judge.py* with the following parameters: # --input_path: Path to save output results # --output_path: Path to judged results # --model_name: Version of the judge model # --num_threads: Number of threads for parallel processing python tools/gpt4o_judge.py \ --input_path <path_to_saved_results> \ --output_path <path_to_judged_results> \ --model_name <model_version> \ --num_threads <num_threads> E.g.: python tools/gpt4o_judge.py \ --input_path ./results/qwen_results.jsonl \ --output_path ./results/qwen_results_judged.jsonl \ --model_name gpt-4o-2024-08-06 \ --num_threads 1 ``` **Note: the released results are judged by *gpt-4o-2024-08-06***

# VideoEval-Pro VideoEval-Pro 是一款鲁棒且贴合真实场景的长视频理解基准测试集，包含开放式短问答类问答任务。该数据集通过将四个现有长视频理解多项选择题（Multiple Choice Question, MCQ）基准——Video-MME、MLVU、LVBench 与 LongVideoBench 中的题目重构为自由格式问答问题构建而成。相关论文可访问 [https://huggingface.co/papers/2505.14640](https://huggingface.co/papers/2505.14640) 查阅。评估代码与脚本可于 [TIGER-AI-Lab/VideoEval-Pro](https://github.com/TIGER-AI-Lab/VideoEval-Pro) 获取。 ## 数据集结构数据集中的每个样本包含以下字段： - `video`：视频文件的名称（路径） - `question`：针对视频内容提出的问题 - `options`：来源基准测试中的原始选项 - `answer`：多项选择题的正确答案 - `answer_text`：正确的自由格式答案 - `meta`：来源基准测试附带的额外元数据 - `source`：所属基准测试来源 - `qa_subtype`：问答任务子类型 - `qa_type`：问答任务类型 ## 评估流程 ### 1. 下载并准备视频 bash # 切换至视频目录 cd videos # 将所有分卷 tar.gz 文件合并为单个归档文件 cat videos_part_*.tar.gz > videos_merged.tar.gz # 解压合并后的归档文件 tar -xzf videos_merged.tar.gz # [可选] 清理分卷文件与合并后的归档文件 rm videos_part_*.tar.gz videos_merged.tar.gz # 解压完成后，将得到包含全部视频文件的目录 # 该目录的路径将作为评估时的 --video_root 参数值 # 示例：'VideoEval-Pro/videos' ### 2. [可选] 预提取视频帧为提升评估效率，可预先从视频中提取帧图像。提取后的帧需按照如下格式组织： frames_root/ ├── video_name_1/ # 目录名与视频文件名一致 │ ├── 000001.jpg # 帧图像文件 │ ├── 000002.jpg │ └── ... ├── video_name_2/ │ ├── 000001.jpg │ ├── 000002.jpg │ └── ... └── ... 完成帧提取后，该帧目录的路径将作为 `--frames_root` 参数值。运行评估脚本时需将 `--using_frames` 参数设为 `True`。 ### 3. 搭建评估环境 bash # 从 GitHub 仓库克隆项目 git clone https://github.com/TIGER-AI-Lab/VideoEval-Pro cd VideoEval-Pro # 根据 requirements.txt 创建 Conda 环境（不同模型对应不同的依赖文件） conda create -n videoevalpro --file requirements.txt conda activate videoevalpro ### 4. 运行评估 bash cd VideoEval-Pro # 设置 PYTHONPATH 环境变量 export PYTHONPATH=. # 执行评估脚本，需传入以下参数： # --video_root：视频文件目录路径 # --frames_root：预提取视频帧的目录路径 [仅当使用预提取帧时需传入] # --output_path：评估结果保存路径 # --using_frames：是否使用预提取的视频帧 # --model_path：模型路径或模型名称 # --device：模型推理所用设备 # --num_frames：从视频中采样的帧数 # --max_retries：推理失败时的最大重试次数 # --num_threads：并行处理所用线程数 python tools/*_chat.py --video_root <path_to_videos> --frames_root <path_to_frames> --output_path <path_to_save_results> --using_frames <True/False> --model_path <model_name_or_path> --device <device> --num_frames <number_of_frames> --max_retries <max_retries> --num_threads <num_threads> 示例： python tools/qwen_chat.py --video_root ./videos --frames_root ./frames --output_path ./results/qwen_results.jsonl --using_frames False --model_path Qwen/Qwen2-VL-7B-Instruct --device cuda --num_frames 32 --max_retries 10 --num_threads 1 ### 5. 结果评判 bash cd VideoEval-Pro # 设置 PYTHONPATH 环境变量 export PYTHONPATH=. # 执行评判脚本 *gpt4o_judge.py*，需传入以下参数： # --input_path：评估结果文件路径 # --output_path：评判结果保存路径 # --model_name：评判模型的版本 # --num_threads：并行处理所用线程数 python tools/gpt4o_judge.py --input_path <path_to_saved_results> --output_path <path_to_judged_results> --model_name <model_version> --num_threads <num_threads> 示例： python tools/gpt4o_judge.py --input_path ./results/qwen_results.jsonl --output_path ./results/qwen_results_judged.jsonl --model_name gpt-4o-2024-08-06 --num_threads 1 **注意：本次发布的评估结果均通过 *gpt-4o-2024-08-06* 模型完成评判**

提供机构：

maas

创建时间：

2025-05-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集