five

MMBench-ru

收藏
魔搭社区2025-12-05 更新2025-08-02 收录
下载链接:
https://modelscope.cn/datasets/deepvk/MMBench-ru
下载链接
链接失效反馈
官方服务:
资源简介:
# MMBench-ru This is a translated version of original [MMBench](https://github.com/open-compass/mmbench/) dataset and stored in format supported for [`lmms-eval`](https://github.com/EvolvingLMMs-Lab/lmms-eval) pipeline. For this dataset, we: 1. Translate the original one with `gpt-4o` 2. Filter out unsuccessful translations, i.e. where the model protection was triggered 3. Manually validate most common errors ## Dataset Structure Dataset includes only dev split that is translated from `dev` split in [`lmms-lab/MMBench_EN`](https://huggingface.co/datasets/lmms-lab/MMBench_EN). Dataset contains 3910 samples in the same to [`lmms-lab/MMBench_EN`](https://huggingface.co/datasets/lmms-lab/MMBench_EN) format: * `index`: ID of a sample * `question`: text of a question * `image`: image for that question * `hint`: if specified, short description of an image that can be useful * `A`, `B`, `C`, `D`: options with possible answers * `answer`: correct answer ## Usage The easiest way to evaluate model on `MMBench-ru` is through [`lmms-eval`](https://github.com/EvolvingLMMs-Lab/lmms-eval) For example, to evaluate [`deepvk/llava-saiga-8b`](https://huggingface.co/deepvk/llava-saiga-8b): ```bash accelerate launch -m lmms_eval --model llava_hf \ --model_args pretrained="deepvk/llava-saiga-8b" \ --tasks mmbench_ru_dev --batch_size 1 \ --log_samples --log_samples_suffix llava-saiga-8b --output_path ./logs/ ``` This prints a table with the result, the main metric for this task is `GPTEvalScore`: 1. The model must generate a text with a letter containing the correct answer. 2. If this line is similar to `answer`, the example will be counted as correct. 3. If the generated string differs from the `answer`, then a request will be made to OpenAI GPT asking whether the model answered correctly. For example, if the model generated not just one letter, but a detailed answer. If OpenAI API Key is not specified when starting validation, the metric is similar to the classic ExactMatch. ## Citation ``` @article{MMBench, author = {Yuan Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, Dahua Lin}, journal = {arXiv:2307.06281}, title = {MMBench: Is Your Multi-modal Model an All-around Player?}, year = {2023}, } ``` ``` @misc{deepvk2024mmbench_ru, title={MMBench-ru}, author={Belopolskih, Daniil and Spirin, Egor}, url={https://huggingface.co/datasets/deepvk/MMBench-ru}, publisher={Hugging Face} year={2024}, } ```

# MMBench-ru 本数据集为原始[MMBench](https://github.com/open-compass/mmbench/)数据集的俄语翻译版本,采用适配[`lmms-eval`](https://github.com/EvolvingLMMs-Lab/lmms-eval)流水线的格式存储。 针对本数据集,我们完成以下工作: 1. 使用`gpt-4o`完成原始数据集的翻译 2. 过滤掉翻译失败的样本,即触发模型内容保护机制的翻译结果 3. 手动校验常见翻译错误 ## 数据集结构 本数据集仅包含从[`lmms-lab/MMBench_EN`](https://huggingface.co/datasets/lmms-lab/MMBench_EN)的`dev`划分集翻译而来的俄语版本。 本数据集共包含3910条样本,格式与[`lmms-lab/MMBench_EN`](https://huggingface.co/datasets/lmms-lab/MMBench_EN)完全一致: * `index`:样本唯一标识符 * `question`:问题文本 * `image`:对应问题的图像 * `hint`:可选字段,为辅助理解图像的简短描述 * `A`、`B`、`C`、`D`:候选答案选项 * `answer`:正确答案 ## 使用方法 在`MMBench-ru`上评估模型的最简方式为通过[`lmms-eval`](https://github.com/EvolvingLMMs-Lab/lmms-eval)流水线实现。 以评估[`deepvk/llava-saiga-8b`](https://huggingface.co/deepvk/llava-saiga-8b)为例: bash accelerate launch -m lmms_eval --model llava_hf --model_args pretrained="deepvk/llava-saiga-8b" --tasks mmbench_ru_dev --batch_size 1 --log_samples --log_samples_suffix llava-saiga-8b --output_path ./logs/ 该命令将输出包含评估结果的表格,本任务的核心评估指标为`GPTEvalScore`: 1. 模型需生成包含正确答案字母的回复文本 2. 若生成文本与`answer`字段完全匹配,则该样本被判定为预测正确 3. 若生成文本与`answer`字段不一致,则将调用OpenAI GPT接口,校验模型回复是否正确——例如当模型未仅输出单个字母,而是生成详细解答时。 若启动评估时未指定OpenAI API密钥,则该指标将退化为经典的精确匹配(ExactMatch)指标。 ## 引用格式 @article{MMBench, author = {Yuan Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, Dahua Lin}, journal = {arXiv:2307.06281}, title = {MMBench: Is Your Multi-modal Model an All-around Player?}, year = {2023}, } @misc{deepvk2024mmbench_ru, title={MMBench-ru}, author={Belopolskih, Daniil and Spirin, Egor}, url={https://huggingface.co/datasets/deepvk/MMBench-ru}, publisher={Hugging Face} year={2024}, }
提供机构:
maas
创建时间:
2025-08-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作