K-SEED

Name: K-SEED
Creator: maas
Published: 2025-12-05 16:43:08
License: 暂无描述

魔搭社区2025-12-05 更新2025-07-26 收录

下载链接：

https://modelscope.cn/datasets/NCSOFT/K-SEED

下载链接

链接失效反馈

官方服务：

资源简介：

# K-SEED We introduce **K-SEED**, a Korean adaptation of the [SEED-Bench](https://arxiv.org/abs/2307.16125) [1] designed for evaluating vision-language models. By translating the first 20 percent of the ```test``` subset of SEED-Bench into Korean, and carefully reviewing its naturalness through human inspection, we developed a novel robust evaluation benchmark specifically for Korean language. K-SEED consists of questions across 12 evaluation dimensions, such as scene understanding, instance identity, and instance attribute, allowing a thorough evaluation of model performance in Korean. For more details, Please refer to the VARCO-VISION technical report. - **Technical Report:** [VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models](https://arxiv.org/pdf/2411.19103) - **Blog(Korean):** [VARCO-VISION Technical Report Summary](https://ncsoft.github.io/ncresearch/95ad8712e60063e9ac97538504ac3eea0ac530af) - **Huggingface Version Model:** [NCSOFT/VARCO-VISION-14B-HF](https://huggingface.co/NCSOFT/VARCO-VISION-14B-HF) - **Evaluation Repository:** [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval) <table> <tr> <th>Image</th> <th>SEED-Bench</th> <th>K-SEED</th> </tr> <tr> <td width=200><img src="https://cdn-uploads.huggingface.co/production/uploads/624ceaa38746b2f5773c2d1c/1ijfEkTCI7mPQo2OfCQCc.jpeg"></td> <td> question: How many towels are in the image? choice_a: One choice_b: Two choice_c: Three choice_d: Four </td> <td> question: 이미지에 수건이 몇 개 있나요? choice_a: 한 개 choice_b: 두 개 choice_c: 세 개 choice_d: 네 개 </td> </tr> </table> ## Inference Prompt ``` <image> {question} A. {choice_a} B. {choice_b} C. {choice_c} D. {choice_d} 주어진 선택지 중 해당 옵션의 문자로 바로 답하세요. ``` ## Results Below are the evaluation results of various vision-language models, including [VARCO-VISION-14B](https://huggingface.co/NCSOFT/VARCO-VISION-14B) on K-SEED. | | VARCO-VISION-14B | Pangea-7B | Pixtral-12B | Molmo-7B-D | Qwen2-VL-7B-Instruct | LLaVA-One-Vision-7B | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | K-SEED | **75.39** | 73.34 | 46.44 | 69.53 | 74.08 | 73.21 | ## References [1] Bohao Li, Yuying Ge, Yixiao Ge, Guangzhi Wang, Rui Wang, Ruimao Zhang, and Ying Shan. Seed-bench: Benchmarking multimodal large language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13299–13308, 2024. ## Citation If you use K-SEED in your research, please cite the following: ```bibtex @misc{ju2024varcovisionexpandingfrontierskorean, title={VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models}, author={Jeongho Ju and Daeyoung Kim and SunYoung Park and Youngjune Kim}, year={2024}, eprint={2411.19103}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2411.19103}, } ```

# K-SEED 我们推出**K-SEED**，它是专为评估视觉语言模型（vision-language models）而设计的SEED-Bench（https://arxiv.org/abs/2307.16125）[1]的韩语适配版本。我们将SEED-Bench的`test`测试子集的前20%内容翻译成韩语，并通过人工审核确保其语言自然流畅，最终构建了一款专为韩语场景设计的全新鲁棒型评估基准数据集。 K-SEED涵盖12个评估维度的问题，包括场景理解、实例识别、实例属性等，可针对韩语场景对模型性能开展全面评估。如需了解更多细节，请参阅VARCO-VISION技术报告。 - **技术报告：** [VARCO-VISION：拓展韩语视觉语言模型的前沿边界](https://arxiv.org/pdf/2411.19103) - **韩语博客：** [VARCO-VISION技术报告摘要](https://ncsoft.github.io/ncresearch/95ad8712e60063e9ac97538504ac3eea0ac530af) - **Hugging Face模型版本：** [NCSOFT/VARCO-VISION-14B-HF](https://huggingface.co/NCSOFT/VARCO-VISION-14B-HF) - **评估代码仓库：** [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval) <table> <tr> <th>图像</th> <th>SEED-Bench</th> <th>K-SEED</th> </tr> <tr> <td width=200><img src="https://cdn-uploads.huggingface.co/production/uploads/624ceaa38746b2f5773c2d1c/1ijfEkTCI7mPQo2OfCQCc.jpeg"></td> <td> 问题： 图中有多少条毛巾？ 选项A： 一条 选项B： 两条 选项C： 三条 选项D： 四条 </td> <td> 问题： 이미지에 수건이 몇 개 있나요? 选项A： 한 개 选项B： 두 개 选项C： 세 개 选项D： 네 개 </td> </tr> </table> ## 推理提示词 <图像> {问题} A. {选项A} B. {选项B} C. {选项C} D. {选项D} 请直接从给定选项中选择对应字母作答。 ## 评估结果以下包括VARCO-VISION-14B（https://huggingface.co/NCSOFT/VARCO-VISION-14B）在内的多款视觉语言模型在K-SEED上的评估结果： | | VARCO-VISION-14B | Pangea-7B | Pixtral-12B | Molmo-7B-D | Qwen2-VL-7B-Instruct | LLaVA-One-Vision-7B | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | K-SEED | **75.39** | 73.34 | 46.44 | 69.53 | 74.08 | 73.21 | ## 参考文献 [1] Bohao Li、Yuying Ge、Yixiao Ge、Guangzhi Wang、Rui Wang、Ruimao Zhang及Ying Shan。Seed-bench：多模态大语言模型评估基准。见：IEEE/CVF计算机视觉与模式识别会议论文集，第13299–13308页，2024。 ## 引用格式若您在研究中使用K-SEED，请引用如下文献： bibtex @misc{ju2024varcovisionexpandingfrontierskorean, title={VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models}, author={Jeongho Ju and Daeyoung Kim and SunYoung Park and Youngjune Kim}, year={2024}, eprint={2411.19103}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2411.19103}, }

提供机构：

maas

创建时间：

2025-07-24

5,000+

优质数据集

54 个

任务类型

进入经典数据集