five

K-SEED

收藏
魔搭社区2025-12-05 更新2025-07-26 收录
下载链接:
https://modelscope.cn/datasets/NCSOFT/K-SEED
下载链接
链接失效反馈
官方服务:
资源简介:
# K-SEED We introduce **K-SEED**, a Korean adaptation of the [SEED-Bench](https://arxiv.org/abs/2307.16125) [1] designed for evaluating vision-language models. By translating the first 20 percent of the ```test``` subset of SEED-Bench into Korean, and carefully reviewing its naturalness through human inspection, we developed a novel robust evaluation benchmark specifically for Korean language. K-SEED consists of questions across 12 evaluation dimensions, such as scene understanding, instance identity, and instance attribute, allowing a thorough evaluation of model performance in Korean. For more details, Please refer to the VARCO-VISION technical report. - **Technical Report:** [VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models](https://arxiv.org/pdf/2411.19103) - **Blog(Korean):** [VARCO-VISION Technical Report Summary](https://ncsoft.github.io/ncresearch/95ad8712e60063e9ac97538504ac3eea0ac530af) - **Huggingface Version Model:** [NCSOFT/VARCO-VISION-14B-HF](https://huggingface.co/NCSOFT/VARCO-VISION-14B-HF) - **Evaluation Repository:** [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval) <table> <tr> <th>Image</th> <th>SEED-Bench</th> <th>K-SEED</th> </tr> <tr> <td width=200><img src="https://cdn-uploads.huggingface.co/production/uploads/624ceaa38746b2f5773c2d1c/1ijfEkTCI7mPQo2OfCQCc.jpeg"></td> <td> <strong>question:</strong> How many towels are in the image? <br> <strong>choice_a:</strong> One <br> <strong>choice_b:</strong> Two <br> <strong>choice_c:</strong> Three <br> <strong>choice_d:</strong> Four </td> <td> <strong>question:</strong> 이미지에 수건이 몇 개 있나요? <br> <strong>choice_a:</strong> 한 개 <br> <strong>choice_b:</strong> 두 개 <br> <strong>choice_c:</strong> 세 개 <br> <strong>choice_d:</strong> 네 개 </td> </tr> </table> <br> ## Inference Prompt ``` <image> {question} A. {choice_a} B. {choice_b} C. {choice_c} D. {choice_d} 주어진 선택지 중 해당 옵션의 문자로 바로 답하세요. ``` <br> ## Results Below are the evaluation results of various vision-language models, including [VARCO-VISION-14B](https://huggingface.co/NCSOFT/VARCO-VISION-14B) on K-SEED. | | VARCO-VISION-14B | Pangea-7B | Pixtral-12B | Molmo-7B-D | Qwen2-VL-7B-Instruct | LLaVA-One-Vision-7B | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | K-SEED | **75.39** | 73.34 | 46.44 | 69.53 | 74.08 | 73.21 | <br> ## References [1] Bohao Li, Yuying Ge, Yixiao Ge, Guangzhi Wang, Rui Wang, Ruimao Zhang, and Ying Shan. Seed-bench: Benchmarking multimodal large language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13299–13308, 2024. <br> ## Citation If you use K-SEED in your research, please cite the following: ```bibtex @misc{ju2024varcovisionexpandingfrontierskorean, title={VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models}, author={Jeongho Ju and Daeyoung Kim and SunYoung Park and Youngjune Kim}, year={2024}, eprint={2411.19103}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2411.19103}, } ```

# K-SEED 我们推出**K-SEED**,它是专为评估视觉语言模型(vision-language models)而设计的SEED-Bench(https://arxiv.org/abs/2307.16125)[1]的韩语适配版本。 我们将SEED-Bench的`test`测试子集的前20%内容翻译成韩语,并通过人工审核确保其语言自然流畅,最终构建了一款专为韩语场景设计的全新鲁棒型评估基准数据集。 K-SEED涵盖12个评估维度的问题,包括场景理解、实例识别、实例属性等,可针对韩语场景对模型性能开展全面评估。 如需了解更多细节,请参阅VARCO-VISION技术报告。 - **技术报告:** [VARCO-VISION:拓展韩语视觉语言模型的前沿边界](https://arxiv.org/pdf/2411.19103) - **韩语博客:** [VARCO-VISION技术报告摘要](https://ncsoft.github.io/ncresearch/95ad8712e60063e9ac97538504ac3eea0ac530af) - **Hugging Face模型版本:** [NCSOFT/VARCO-VISION-14B-HF](https://huggingface.co/NCSOFT/VARCO-VISION-14B-HF) - **评估代码仓库:** [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval) <table> <tr> <th>图像</th> <th>SEED-Bench</th> <th>K-SEED</th> </tr> <tr> <td width=200><img src="https://cdn-uploads.huggingface.co/production/uploads/624ceaa38746b2f5773c2d1c/1ijfEkTCI7mPQo2OfCQCc.jpeg"></td> <td> <strong>问题:</strong> 图中有多少条毛巾? <br> <strong>选项A:</strong> 一条 <br> <strong>选项B:</strong> 两条 <br> <strong>选项C:</strong> 三条 <br> <strong>选项D:</strong> 四条 </td> <td> <strong>问题:</strong> 이미지에 수건이 몇 개 있나요? <br> <strong>选项A:</strong> 한 개 <br> <strong>选项B:</strong> 두 개 <br> <strong>选项C:</strong> 세 개 <br> <strong>选项D:</strong> 네 개 </td> </tr> </table> <br> ## 推理提示词 <图像> {问题} A. {选项A} B. {选项B} C. {选项C} D. {选项D} 请直接从给定选项中选择对应字母作答。 <br> ## 评估结果 以下包括VARCO-VISION-14B(https://huggingface.co/NCSOFT/VARCO-VISION-14B)在内的多款视觉语言模型在K-SEED上的评估结果: | | VARCO-VISION-14B | Pangea-7B | Pixtral-12B | Molmo-7B-D | Qwen2-VL-7B-Instruct | LLaVA-One-Vision-7B | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | K-SEED | **75.39** | 73.34 | 46.44 | 69.53 | 74.08 | 73.21 | <br> ## 参考文献 [1] Bohao Li、Yuying Ge、Yixiao Ge、Guangzhi Wang、Rui Wang、Ruimao Zhang及Ying Shan。Seed-bench:多模态大语言模型评估基准。见:IEEE/CVF计算机视觉与模式识别会议论文集,第13299–13308页,2024。 <br> ## 引用格式 若您在研究中使用K-SEED,请引用如下文献: bibtex @misc{ju2024varcovisionexpandingfrontierskorean, title={VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models}, author={Jeongho Ju and Daeyoung Kim and SunYoung Park and Youngjune Kim}, year={2024}, eprint={2411.19103}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2411.19103}, }
提供机构:
maas
创建时间:
2025-07-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作