ViMUL-Bench

Name: ViMUL-Bench
Creator: maas
Published: 2025-12-05 16:38:01
License: 暂无描述

魔搭社区2025-12-05 更新2025-06-14 收录

下载链接：

https://modelscope.cn/datasets/MBZUAI/ViMUL-Bench

下载链接

链接失效反馈

官方服务：

资源简介：

# ViMUL-Bench: A Culturally-diverse Multilingual Multimodal Video Benchmark [![🤗 Hugging Face](https://img.shields.io/badge/🤗%20Hugging%20Face-Dataset-blue)](https://huggingface.co/datasets/MBZUAI/ViMUL-Bench) [![📄 Paper](https://img.shields.io/badge/📄-Paper-red)](https://huggingface.co/papers/2506.07032) [![🌐 Project Page](https://img.shields.io/badge/🌐-Project%20Page-green)](https://mbzuai-oryx.github.io/ViMUL/) # Overview The evaluation toolkit to be used is [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval). This toolkit facilitates the evaluation of models across multiple tasks and languages. ## Key Features - **🌍 14 Languages:** English, Chinese, Spanish, French, German, Hindi, Arabic, Russian, Bengali, Urdu, Sinhala, Tamil, Swedish, Japanese - **🎭 15 Categories:** Including 8 culturally diverse categories (lifestyles, festivals, foods, rituals, local landmarks, cultural personalities) - **📝 Question Types:** Open-ended (short and long-form) and multiple-choice questions - **⏱️ Video Durations:** Short, medium, and long videos - **✅ Quality:** 8,000 samples manually verified by native language speakers - **🎯 Purpose:** Benchmark for culturally and linguistically inclusive multilingual video LMMs ## Dataset Structure - **Test Data:** Organized by language with separate files for MCQ and open-ended questions - Format: `test/{language}/{language}_{mcq|oe}.parquet` - Example: `test/english/english_mcq.parquet`, `test/arabic/arabic_oe.parquet` - **Configs:** Each language-task combination is available as a separate configuration # Installation To install `lmms-eval`, execute the following commands: ```bash git clone https://github.com/EvolvingLMMs-Lab/lmms-eval cd lmms-eval pip install -e . ``` For additional dependencies for models, please refer to the [lmms-eval repository](https://github.com/EvolvingLMMs-Lab/lmms-eval). # Preparing the ViMUL-Bench Task Files Copy the required ViMUL-Bench task files to the `lmms-eval` tasks directory: ```bash # For mcq huggingface-cli download MBZUAI/ViMUL-Bench --repo-type dataset --include lmms_eval/tasks/vimul_bench_mcq/ --local-dir ./ # For oe huggingface-cli download MBZUAI/ViMUL-Bench --repo-type dataset --include lmms_eval/tasks/vimul_bench_oe/ --local-dir ./ ``` # Running Evaluations ## Tasks to Evaluate To evaluate the tasks, use the following options: ```bash --tasks vimulmcq_test,vimuloe_test ``` # Example: Evaluating `llavaonevision` ## Clone the Repository Clone the `llavaonevision` repository: ```bash git clone https://github.com/LLaVA-VL/LLaVA-NeXT ``` ## Download the Dataset Use `huggingface-cli` for parallel dataset download: ```bash huggingface-cli download MBZUAI/ViMUL-Bench --repo-type dataset ``` ## Run the Evaluation Export the necessary environment variables: ```bash export HF_HOME=<path to hf> export PYTHONPATH=<path to LLaVA-NeXT> ``` Run the evaluation command: ```bash accelerate launch --num_processes 8 -m lmms_eval \ --model llava_onevision \ --model_args pretrained="lmms-lab/llava-onevision-qwen2-7b-ov-chat" \ --tasks vimulmcq_test,vimuloe_test \ --batch_size 1 \ --log_samples \ --output_path ./logs/ \ --verbosity INFO ``` ## Output The model responses will be saved in the `logs` directory after the evaluation. ## Citation ``` @misc{shafique2025culturallydiversemultilingualmultimodalvideo, title={A Culturally-diverse Multilingual Multimodal Video Benchmark & Model}, author={Bhuiyan Sanjid Shafique and Ashmal Vayani and Muhammad Maaz and Hanoona Abdul Rasheed and Dinura Dissanayake and Mohammed Irfan Kurpath and Yahya Hmaiti and Go Inoue and Jean Lahoud and Md. Safirur Rashid and Shadid Intisar Quasem and Maheen Fatima and Franco Vidal and Mykola Maslych and Ketan Pravin More and Sanoojan Baliah and Hasindri Watawana and Yuhao Li and Fabian Farestam and Leon Schaller and Roman Tymtsiv and Simon Weber and Hisham Cholakkal and Ivan Laptev and Shin'ichi Satoh and Michael Felsberg and Mubarak Shah and Salman Khan and Fahad Shahbaz Khan}, year={2025}, eprint={2506.07032}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2506.07032}, } ```

# ViMUL-Bench: 多文化多语言多模态视频基准测试集（ViMUL-Bench） [![🤗 Hugging Face](https://img.shields.io/badge/🤗%20Hugging%20Face-Dataset-blue)](https://huggingface.co/datasets/MBZUAI/ViMUL-Bench) [![📄 Paper](https://img.shields.io/badge/📄-Paper-red)](https://huggingface.co/papers/2506.07032) [![🌐 Project Page](https://img.shields.io/badge/🌐-Project%20Page-green)](https://mbzuai-oryx.github.io/ViMUL/) # 概述本次使用的评估工具为[lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval)，该工具可支持跨多任务、多语言的模型评估。 ## 核心特性 - **🌍 14种语言**：英语、中文、西班牙语、法语、德语、印地语、阿拉伯语、俄语、孟加拉语、乌尔都语、僧伽罗语、泰米尔语、瑞典语、日语 - **🎭 15个类别**：包含8个多文化类别（生活方式、节日、饮食、仪式、本地地标、文化名人） - **📝 问题类型**：开放式问题（Open-ended，OE）与多项选择题（Multiple Choice Question，MCQ） - **⏱️ 视频时长**：涵盖短、中、长三种时长的视频 - **✅ 数据质量**：8000个样本均由母语使用者人工核验 - **🎯 设计目标**：面向文化与语言包容性的多语言视频大语言模型（Large Language Model，LLM）基准测试集 ## 数据集结构 - **测试数据**：按语言组织，分别为多项选择题与开放式问题提供独立文件 - 文件格式：`test/{language}/{language}_{mcq|oe}.parquet` - 示例：`test/english/english_mcq.parquet`、`test/arabic/arabic_oe.parquet` - **配置文件**：每种语言-任务组合均提供独立的配置项 # 安装如需安装`lmms-eval`，请执行以下命令： bash git clone https://github.com/EvolvingLMMs-Lab/lmms-eval cd lmms-eval pip install -e . 如需获取模型所需的额外依赖项，请参考[lmms-eval仓库](https://github.com/EvolvingLMMs-Lab/lmms-eval)。 # 准备ViMUL-Bench任务文件将所需的ViMUL-Bench任务文件复制到`lmms-eval`的任务目录中： bash # 针对多项选择题任务 huggingface-cli download MBZUAI/ViMUL-Bench --repo-type dataset --include lmms_eval/tasks/vimul_bench_mcq/ --local-dir ./ # 针对开放式问题任务 huggingface-cli download MBZUAI/ViMUL-Bench --repo-type dataset --include lmms_eval/tasks/vimul_bench_oe/ --local-dir ./ # 运行评估 ## 待评估任务如需评估相关任务，请使用以下参数选项： bash --tasks vimulmcq_test,vimuloe_test ## 示例：评估`llavaonevision`模型 ### 克隆仓库克隆`llavaonevision`对应的仓库： bash git clone https://github.com/LLaVA-VL/LLaVA-NeXT ### 下载数据集使用`huggingface-cli`进行并行数据集下载： bash huggingface-cli download MBZUAI/ViMUL-Bench --repo-type dataset ### 运行评估导出必要的环境变量： bash export HF_HOME=<HF的本地路径> export PYTHONPATH=<LLaVA-NeXT的本地路径> 执行评估命令： bash accelerate launch --num_processes 8 -m lmms_eval --model llava_onevision --model_args pretrained="lmms-lab/llava-onevision-qwen2-7b-ov-chat" --tasks vimulmcq_test,vimuloe_test --batch_size 1 --log_samples --output_path ./logs/ --verbosity INFO ### 输出结果评估完成后，模型的响应结果将保存至`logs`目录中。 ## 引用 @misc{shafique2025culturallydiversemultilingualmultimodalvideo, title={A Culturally-diverse Multilingual Multimodal Video Benchmark & Model}, author={Bhuiyan Sanjid Shafique and Ashmal Vayani and Muhammad Maaz and Hanoona Abdul Rasheed and Dinura Dissanayake and Mohammed Irfan Kurpath and Yahya Hmaiti and Go Inoue and Jean Lahoud and Md. Safirur Rashid and Shadid Intisar Quasem and Maheen Fatima and Franco Vidal and Mykola Maslych and Ketan Pravin More and Sanoojan Baliah and Hasindri Watawana and Yuhao Li and Fabian Farestam and Leon Schaller and Roman Tymtsiv and Simon Weber and Hisham Cholakkal and Ivan Laptev and Shin'ichi Satoh and Michael Felsberg and Mubarak Shah and Salman Khan and Fahad Shahbaz Khan}, year={2025}, eprint={2506.07032}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2506.07032}, }

提供机构：

maas

创建时间：

2025-06-10

5,000+

优质数据集

54 个

任务类型

进入经典数据集