K-MMStar
收藏魔搭社区2025-12-05 更新2025-07-26 收录
下载链接:
https://modelscope.cn/datasets/NCSOFT/K-MMStar
下载链接
链接失效反馈官方服务:
资源简介:
# K-MMStar
We introduce **K-MMStar**, a Korean adaptation of the [MMStar](https://arxiv.org/abs/2403.20330) [1] designed for evaluating vision-language models.
By translating the ```val``` subset of MMStar into Korean and carefully reviewing its naturalness through human inspection, we developed a novel robust evaluation benchmark specifically for Korean language.
(We observe that there are unanswerable cases *(e.g., multiple images required to answer the question but only has a single image, vague questions or options)* in the original MMStar dataset. Thus, we modify or re-create the questions to ensure they can be answered within a single image.)
K-MMStar consists of questions across 6 evaluation dimensions, such as coarse perception, fine-grained perception, and instance reasoning, allowing a thorough evaluation of model performance in Korean.
For more details, Please refer to the VARCO-VISION technical report.
- **Technical Report:** [VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models](https://arxiv.org/pdf/2411.19103)
- **Blog(Korean):** [VARCO-VISION Technical Report Summary](https://ncsoft.github.io/ncresearch/95ad8712e60063e9ac97538504ac3eea0ac530af)
- **Huggingface Version Model:** [NCSOFT/VARCO-VISION-14B-HF](https://huggingface.co/NCSOFT/VARCO-VISION-14B-HF)
- **Evaluation Repository:** [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval)
<table>
<tr>
<th>Image</th>
<th>MMStar</th>
<th>K-MMStar</th>
</tr>
<tr>
<td width=200><img src="https://cdn-uploads.huggingface.co/production/uploads/624ceaa38746b2f5773c2d1c/4N3YLHmLMlxXvdRFssxPz.jpeg"></td>
<td>
<strong>question:</strong> Which option describe the object relationship in the image correctly? Options: A: The suitcase is on the book., B: The suitcase is beneath the cat., C: The suitcase is beneath the bed., D: The suitcase is beneath the book.
</td>
<td>
<strong>question:</strong> 이미지에서 물체들의 관계를 올바르게 설명하는 옵션은 무엇인가요? Options: A: 가방이 책 위에 있다., B: 가방이 고양이 아래에 있다., C: 가방이 침대 아래에 있다., D: 가방이 책 아래에 있다.
</td>
</tr>
</table>
<br>
## Inference Prompt
```
{question}
```
<br>
## Results
Below are the evaluation results of various vision-language models, including [VARCO-VISION-14B](https://huggingface.co/NCSOFT/VARCO-VISION-14B) on K-MMStar.
| | VARCO-VISION-14B | Pangea-7B | Pixtral-12B | Molmo-7B-D | Qwen2-VL-7B-Instruct | LLaVA-One-Vision-7B |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| K-MMStar | **57.33** | 35.00 | 23.93 | 47.40 | 50.67 | 54.00 |
<br>
## References
[1] Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Jiaqi Wang, Yu Qiao, Dahua Lin, and Feng Zhao. Are we on the right way for evaluating large vision-language models? In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=evP9mxNNxJ.
<br>
## Citation
If you use K-MMStar in your research, please cite the following:
```bibtex
@misc{ju2024varcovisionexpandingfrontierskorean,
title={VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models},
author={Jeongho Ju and Daeyoung Kim and SunYoung Park and Youngjune Kim},
year={2024},
eprint={2411.19103},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.19103},
}
```
# K-MMStar
**K-MMStar** 是专为评估视觉语言模型(Vision-Language Model, VLM)设计的[MMStar](https://arxiv.org/abs/2403.20330) [1]的韩语适配版本。我们将MMStar的`val`子集翻译为韩语,并通过人工审阅确保其语言自然性,最终构建了一款专为韩语场景设计的全新鲁棒评测基准。(我们发现原始MMStar数据集存在部分无法作答的场景,例如:需结合多张图片才能回答的问题却仅提供单张图片、问题或选项表述模糊等。因此,我们对问题进行了修改或重新编写,确保所有题目均可通过单张图片作答。)K-MMStar涵盖6大评测维度的题目,包括粗粒度感知、细粒度感知、实例推理等,可对模型在韩语场景下的性能进行全面评估。
如需了解更多细节,请参阅VARCO-VISION技术报告。
- **技术报告**: [VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models](https://arxiv.org/pdf/2411.19103)
- **韩语博客**: [VARCO-VISION Technical Report Summary](https://ncsoft.github.io/ncresearch/95ad8712e60063e9ac97538504ac3eea0ac530af)
- **Hugging Face模型版本**: [NCSOFT/VARCO-VISION-14B-HF](https://huggingface.co/NCSOFT/VARCO-VISION-14B-HF)
- **评测代码仓库**: [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval)
<table>
<tr>
<th>示例图片</th>
<th>MMStar</th>
<th>K-MMStar</th>
</tr>
<tr>
<td width=200><img src="https://cdn-uploads.huggingface.co/production/uploads/624ceaa38746b2f5773c2d1c/4N3YLHmLMlxXvdRFssxPz.jpeg"></td>
<td>
<strong>问题:</strong> 请选出对图中物体关系描述正确的选项。选项: A: 行李箱置于书本之上, B: 行李箱位于猫咪下方, C: 行李箱位于床下方, D: 行李箱位于书本下方
</td>
<td>
<strong>问题:</strong> 请选出对图中物体关系描述正确的选项。选项: A: 行李箱放在书本上, B: 行李箱在猫咪下方, C: 行李箱在床下方, D: 行李箱在书本下方
</td>
</tr>
</table>
<br>
## 推理提示
{question}
<br>
## 评测结果
以下为包括[VARCO-VISION-14B](https://huggingface.co/NCSOFT/VARCO-VISION-14B)在内的多款视觉语言模型在K-MMStar上的评测结果。
| | VARCO-VISION-14B | Pangea-7B | Pixtral-12B | Molmo-7B-D | Qwen2-VL-7B-Instruct | LLaVA-One-Vision-7B |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| K-MMStar | **57.33** | 35.00 | 23.93 | 47.40 | 50.67 | 54.00 |
<br>
## 参考文献
[1] Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Jiaqi Wang, Yu Qiao, Dahua Lin, Feng Zhao. Are we on the right way for evaluating large vision-language models? In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=evP9mxNNxJ.
<br>
## 引用声明
如果您在研究中使用K-MMStar,请引用以下文献:
bibtex
@misc{ju2024varcovisionexpandingfrontierskorean,
title={VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models},
author={Jeongho Ju and Daeyoung Kim and SunYoung Park and Youngjune Kim},
year={2024},
eprint={2411.19103},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.19103},
}
提供机构:
maas
创建时间:
2025-07-24



