MMBench-ru

Name: MMBench-ru
Creator: maas
Published: 2025-12-05 16:44:12
License: 暂无描述

魔搭社区2025-12-05 更新2025-08-02 收录

下载链接：

https://modelscope.cn/datasets/deepvk/MMBench-ru

下载链接

链接失效反馈

官方服务：

资源简介：

# MMBench-ru This is a translated version of original [MMBench](https://github.com/open-compass/mmbench/) dataset and stored in format supported for [`lmms-eval`](https://github.com/EvolvingLMMs-Lab/lmms-eval) pipeline. For this dataset, we: 1. Translate the original one with `gpt-4o` 2. Filter out unsuccessful translations, i.e. where the model protection was triggered 3. Manually validate most common errors ## Dataset Structure Dataset includes only dev split that is translated from `dev` split in [`lmms-lab/MMBench_EN`](https://huggingface.co/datasets/lmms-lab/MMBench_EN). Dataset contains 3910 samples in the same to [`lmms-lab/MMBench_EN`](https://huggingface.co/datasets/lmms-lab/MMBench_EN) format: * `index`: ID of a sample * `question`: text of a question * `image`: image for that question * `hint`: if specified, short description of an image that can be useful * `A`, `B`, `C`, `D`: options with possible answers * `answer`: correct answer ## Usage The easiest way to evaluate model on `MMBench-ru` is through [`lmms-eval`](https://github.com/EvolvingLMMs-Lab/lmms-eval) For example, to evaluate [`deepvk/llava-saiga-8b`](https://huggingface.co/deepvk/llava-saiga-8b): ```bash accelerate launch -m lmms_eval --model llava_hf \ --model_args pretrained="deepvk/llava-saiga-8b" \ --tasks mmbench_ru_dev --batch_size 1 \ --log_samples --log_samples_suffix llava-saiga-8b --output_path ./logs/ ``` This prints a table with the result, the main metric for this task is `GPTEvalScore`: 1. The model must generate a text with a letter containing the correct answer. 2. If this line is similar to `answer`, the example will be counted as correct. 3. If the generated string differs from the `answer`, then a request will be made to OpenAI GPT asking whether the model answered correctly. For example, if the model generated not just one letter, but a detailed answer. If OpenAI API Key is not specified when starting validation, the metric is similar to the classic ExactMatch. ## Citation ``` @article{MMBench, author = {Yuan Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, Dahua Lin}, journal = {arXiv:2307.06281}, title = {MMBench: Is Your Multi-modal Model an All-around Player?}, year = {2023}, } ``` ``` @misc{deepvk2024mmbench_ru, title={MMBench-ru}, author={Belopolskih, Daniil and Spirin, Egor}, url={https://huggingface.co/datasets/deepvk/MMBench-ru}, publisher={Hugging Face} year={2024}, } ```

# MMBench-ru 本数据集为原始[MMBench](https://github.com/open-compass/mmbench/)数据集的俄语翻译版本，采用适配[`lmms-eval`](https://github.com/EvolvingLMMs-Lab/lmms-eval)流水线的格式存储。针对本数据集，我们完成以下工作： 1. 使用`gpt-4o`完成原始数据集的翻译 2. 过滤掉翻译失败的样本，即触发模型内容保护机制的翻译结果 3. 手动校验常见翻译错误 ## 数据集结构本数据集仅包含从[`lmms-lab/MMBench_EN`](https://huggingface.co/datasets/lmms-lab/MMBench_EN)的`dev`划分集翻译而来的俄语版本。本数据集共包含3910条样本，格式与[`lmms-lab/MMBench_EN`](https://huggingface.co/datasets/lmms-lab/MMBench_EN)完全一致： * `index`：样本唯一标识符 * `question`：问题文本 * `image`：对应问题的图像 * `hint`：可选字段，为辅助理解图像的简短描述 * `A`、`B`、`C`、`D`：候选答案选项 * `answer`：正确答案 ## 使用方法在`MMBench-ru`上评估模型的最简方式为通过[`lmms-eval`](https://github.com/EvolvingLMMs-Lab/lmms-eval)流水线实现。以评估[`deepvk/llava-saiga-8b`](https://huggingface.co/deepvk/llava-saiga-8b)为例： bash accelerate launch -m lmms_eval --model llava_hf --model_args pretrained="deepvk/llava-saiga-8b" --tasks mmbench_ru_dev --batch_size 1 --log_samples --log_samples_suffix llava-saiga-8b --output_path ./logs/ 该命令将输出包含评估结果的表格，本任务的核心评估指标为`GPTEvalScore`： 1. 模型需生成包含正确答案字母的回复文本 2. 若生成文本与`answer`字段完全匹配，则该样本被判定为预测正确 3. 若生成文本与`answer`字段不一致，则将调用OpenAI GPT接口，校验模型回复是否正确——例如当模型未仅输出单个字母，而是生成详细解答时。若启动评估时未指定OpenAI API密钥，则该指标将退化为经典的精确匹配（ExactMatch）指标。 ## 引用格式 @article{MMBench, author = {Yuan Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, Dahua Lin}, journal = {arXiv:2307.06281}, title = {MMBench: Is Your Multi-modal Model an All-around Player?}, year = {2023}, } @misc{deepvk2024mmbench_ru, title={MMBench-ru}, author={Belopolskih, Daniil and Spirin, Egor}, url={https://huggingface.co/datasets/deepvk/MMBench-ru}, publisher={Hugging Face} year={2024}, }

提供机构：

maas

创建时间：

2025-08-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集