MMBench-ru
收藏魔搭社区2025-12-05 更新2025-08-02 收录
下载链接:
https://modelscope.cn/datasets/deepvk/MMBench-ru
下载链接
链接失效反馈官方服务:
资源简介:
# MMBench-ru
This is a translated version of original [MMBench](https://github.com/open-compass/mmbench/) dataset and
stored in format supported for [`lmms-eval`](https://github.com/EvolvingLMMs-Lab/lmms-eval) pipeline.
For this dataset, we:
1. Translate the original one with `gpt-4o`
2. Filter out unsuccessful translations, i.e. where the model protection was triggered
3. Manually validate most common errors
## Dataset Structure
Dataset includes only dev split that is translated from `dev` split in [`lmms-lab/MMBench_EN`](https://huggingface.co/datasets/lmms-lab/MMBench_EN).
Dataset contains 3910 samples in the same to [`lmms-lab/MMBench_EN`](https://huggingface.co/datasets/lmms-lab/MMBench_EN) format:
* `index`: ID of a sample
* `question`: text of a question
* `image`: image for that question
* `hint`: if specified, short description of an image that can be useful
* `A`, `B`, `C`, `D`: options with possible answers
* `answer`: correct answer
## Usage
The easiest way to evaluate model on `MMBench-ru` is through [`lmms-eval`](https://github.com/EvolvingLMMs-Lab/lmms-eval)
For example, to evaluate [`deepvk/llava-saiga-8b`](https://huggingface.co/deepvk/llava-saiga-8b):
```bash
accelerate launch -m lmms_eval --model llava_hf \
--model_args pretrained="deepvk/llava-saiga-8b" \
--tasks mmbench_ru_dev --batch_size 1 \
--log_samples --log_samples_suffix llava-saiga-8b --output_path ./logs/
```
This prints a table with the result, the main metric for this task is `GPTEvalScore`:
1. The model must generate a text with a letter containing the correct answer.
2. If this line is similar to `answer`, the example will be counted as correct.
3. If the generated string differs from the `answer`, then a request will be made to OpenAI GPT asking whether the model answered correctly. For example, if the model generated not just one letter, but a detailed answer.
If OpenAI API Key is not specified when starting validation, the metric is similar to the classic ExactMatch.
## Citation
```
@article{MMBench,
author = {Yuan Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, Dahua Lin},
journal = {arXiv:2307.06281},
title = {MMBench: Is Your Multi-modal Model an All-around Player?},
year = {2023},
}
```
```
@misc{deepvk2024mmbench_ru,
title={MMBench-ru},
author={Belopolskih, Daniil and Spirin, Egor},
url={https://huggingface.co/datasets/deepvk/MMBench-ru},
publisher={Hugging Face}
year={2024},
}
```
# MMBench-ru
本数据集为原始[MMBench](https://github.com/open-compass/mmbench/)数据集的俄语翻译版本,采用适配[`lmms-eval`](https://github.com/EvolvingLMMs-Lab/lmms-eval)流水线的格式存储。
针对本数据集,我们完成以下工作:
1. 使用`gpt-4o`完成原始数据集的翻译
2. 过滤掉翻译失败的样本,即触发模型内容保护机制的翻译结果
3. 手动校验常见翻译错误
## 数据集结构
本数据集仅包含从[`lmms-lab/MMBench_EN`](https://huggingface.co/datasets/lmms-lab/MMBench_EN)的`dev`划分集翻译而来的俄语版本。
本数据集共包含3910条样本,格式与[`lmms-lab/MMBench_EN`](https://huggingface.co/datasets/lmms-lab/MMBench_EN)完全一致:
* `index`:样本唯一标识符
* `question`:问题文本
* `image`:对应问题的图像
* `hint`:可选字段,为辅助理解图像的简短描述
* `A`、`B`、`C`、`D`:候选答案选项
* `answer`:正确答案
## 使用方法
在`MMBench-ru`上评估模型的最简方式为通过[`lmms-eval`](https://github.com/EvolvingLMMs-Lab/lmms-eval)流水线实现。
以评估[`deepvk/llava-saiga-8b`](https://huggingface.co/deepvk/llava-saiga-8b)为例:
bash
accelerate launch -m lmms_eval --model llava_hf
--model_args pretrained="deepvk/llava-saiga-8b"
--tasks mmbench_ru_dev --batch_size 1
--log_samples --log_samples_suffix llava-saiga-8b --output_path ./logs/
该命令将输出包含评估结果的表格,本任务的核心评估指标为`GPTEvalScore`:
1. 模型需生成包含正确答案字母的回复文本
2. 若生成文本与`answer`字段完全匹配,则该样本被判定为预测正确
3. 若生成文本与`answer`字段不一致,则将调用OpenAI GPT接口,校验模型回复是否正确——例如当模型未仅输出单个字母,而是生成详细解答时。
若启动评估时未指定OpenAI API密钥,则该指标将退化为经典的精确匹配(ExactMatch)指标。
## 引用格式
@article{MMBench,
author = {Yuan Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, Dahua Lin},
journal = {arXiv:2307.06281},
title = {MMBench: Is Your Multi-modal Model an All-around Player?},
year = {2023},
}
@misc{deepvk2024mmbench_ru,
title={MMBench-ru},
author={Belopolskih, Daniil and Spirin, Egor},
url={https://huggingface.co/datasets/deepvk/MMBench-ru},
publisher={Hugging Face}
year={2024},
}
提供机构:
maas
创建时间:
2025-08-01



