K-DTCBench

Name: K-DTCBench
Creator: maas
Published: 2025-12-05 16:43:08
License: 暂无描述

魔搭社区2025-12-05 更新2025-07-26 收录

下载链接：

https://modelscope.cn/datasets/NCSOFT/K-DTCBench

下载链接

链接失效反馈

官方服务：

资源简介：

# K-DTCBench We introduce **K-DTCBench**, a newly developed Korean benchmark featuring both computer-generated and handwritten documents, tables, and charts. It consists of 80 questions for each image type and two questions per image, summing up to 240 questions in total. This benchmark is designed to evaluate whether vision-language models can process images in different formats and be applicable for diverse domains. All images are generated with made-up values and statements for evaluation purposes only. We scanned hand-written documents/tables/charts, or created digital objects with matplotlib library to build K-DTCBench. The proportions of digital and hand-written images are equal, each constituting 50%. For more details, Please refer to the VARCO-VISION technical report. - **Technical Report:** [VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models](https://arxiv.org/pdf/2411.19103) - **Blog(Korean):** [VARCO-VISION Technical Report Summary](https://ncsoft.github.io/ncresearch/95ad8712e60063e9ac97538504ac3eea0ac530af) - **Huggingface Version Model:** [NCSOFT/VARCO-VISION-14B-HF](https://huggingface.co/NCSOFT/VARCO-VISION-14B-HF) - **Evaluation Repository:** [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval) <table> <tr> <th>Category</th> <th>Image</th> <th>K-DTCBench</th> </tr> <tr> <td align="center">document</td> <td width=350><img src="https://cdn-uploads.huggingface.co/production/uploads/624ceaa38746b2f5773c2d1c/Ipi4HR73P-PDC5XcgP3WF.png"></td> <td> question: 보고서의 주요 내용이 아닌 것은 무엇인가요? A: 안전 인프라 확충 B: 재난 및 사고 예방 체계 구축 C: 시민 안전 교육 강화 D: 긴급 대응 시스템 개선 </td> </tr> <tr> <td align="center">table</td> <td width=350><img src="https://cdn-uploads.huggingface.co/production/uploads/624ceaa38746b2f5773c2d1c/dz_FuPnpZ5P4P3LEB5PZ0.png"></td> <td> question: 인프라 구축 항목의 점수는 몇 점인가요? A: 4 B: 6 C: 8 D: 10 </td> </tr> <tr> <td align="center">chart</td> <td width=350><img src="https://cdn-uploads.huggingface.co/production/uploads/624ceaa38746b2f5773c2d1c/IbNMPPgd974SbCAsz6zIS.png"></td> <td> question: 직장인들이 퇴근 후 두 번째로 선호하는 활동은 무엇인가요? A: 운동 B: 여가활동 C: 자기개발 D: 휴식 </td> </tr> </table> ## Inference Prompt ``` <image> {question} Options: A: {A}, B: {B}, C: {C}, D: {D} 주어진 선택지 중 해당 옵션의 문자로 바로 답하세요. ``` ## Results Below are the evaluation results of various vision-language models, including [VARCO-VISION-14B](https://huggingface.co/NCSOFT/VARCO-VISION-14B) on K-DTCBench. | | VARCO-VISION-14B | Pangea-7B | Pixtral-12B | Molmo-7B-D | Qwen2-VL-7B-Instruct | LLaVA-One-Vision-7B | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | K-DTCBench | **84.58** | 48.33 | 27.50 | 45.83 | 75.00 | 52.91 | ## Citation If you use K-DTCBench in your research, please cite the following: ```bibtex @misc{ju2024varcovisionexpandingfrontierskorean, title={VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models}, author={Jeongho Ju and Daeyoung Kim and SunYoung Park and Youngjune Kim}, year={2024}, eprint={2411.19103}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2411.19103}, } ```

# K-DTCBench基准测试集我们推出了**K-DTCBench**，这是一款全新研发的韩语基准测试集，涵盖计算机生成与手写两种形式的文档、表格及图表。该数据集按图像类型各设置80道问题，且每张图像对应2道问题，总计240道测试题目。本基准测试集旨在评估视觉语言模型（Vision-Language Model, VLM）能否处理不同格式的图像，并可适配多领域应用场景。所有图像均采用虚构数值与表述生成，仅用于评估用途。我们通过扫描手写文档、表格与图表，或借助`matplotlib`库创建数字对象，以此构建K-DTCBench基准测试集。数字图像与手写图像的占比各为50%，二者比例均衡。如需了解更多细节，请参阅VARCO-VISION技术报告： - **技术报告**：[VARCO-VISION: 拓展韩语视觉语言模型的边界](https://arxiv.org/pdf/2411.19103) - **韩语博客**：[VARCO-VISION技术报告摘要](https://ncsoft.github.io/ncresearch/95ad8712e60063e9ac97538504ac3eea0ac530af) - **HuggingFace版模型**：[NCSOFT/VARCO-VISION-14B-HF](https://huggingface.co/NCSOFT/VARCO-VISION-14B-HF) - **评估仓库**：[lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval) <table> <tr> <th align="center">类别</th> <th align="center">图像</th> <th align="center">K-DTCBench</th> </tr> <tr> <td align="center">文档</td> <td width=350><img src="https://cdn-uploads.huggingface.co/production/uploads/624ceaa38746b2f5773c2d1c/Ipi4HR73P-PDC5XcgP3WF.png"></td> <td> 问题：以下哪一项不属于该报告的核心内容？ A: 安全基础设施扩容 B: 构建灾害与事故预防体系 C: 强化市民安全教育 D: 优化应急响应系统 </td> </tr> <tr> <td align="center">表格</td> <td width=350><img src="https://cdn-uploads.huggingface.co/production/uploads/624ceaa38746b2f5773c2d1c/dz_FuPnpZ5P4P3LEB5PZ0.png"></td> <td> 问题：基础设施建设项的得分是多少？ A: 4 B: 6 C: 8 D: 10 </td> </tr> <tr> <td align="center">图表</td> <td width=350><img src="https://cdn-uploads.huggingface.co/production/uploads/624ceaa38746b2f5773c2d1c/IbNMPPgd974SbCAsz6zIS.png"></td> <td> 问题：职场人士下班后第二受欢迎的活动是什么？ A: 运动 B: 休闲活动 C: 自我提升 D: 休息 </td> </tr> </table> ## 推理提示 <图像> {问题} 选项：A: {A}，B: {B}，C: {C}，D: {D} 请直接从给定选项中选出对应答案的字母。 ## 评估结果以下为包括[VARCO-VISION-14B](https://huggingface.co/NCSOFT/VARCO-VISION-14B)在内的多款视觉语言模型在K-DTCBench上的评估结果： | | VARCO-VISION-14B | Pangea-7B | Pixtral-12B | Molmo-7B-D | Qwen2-VL-7B-Instruct | LLaVA-One-Vision-7B | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | K-DTCBench | **84.58** | 48.33 | 27.50 | 45.83 | 75.00 | 52.91 | ## 引用若您在研究中使用K-DTCBench，请引用以下文献： bibtex @misc{ju2024varcovisionexpandingfrontierskorean, title={VARCO-VISION: 拓展韩语视觉语言模型的边界}, author={Jeongho Ju and Daeyoung Kim and SunYoung Park and Youngjune Kim}, year={2024}, eprint={2411.19103}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2411.19103}, }

提供机构：

maas

创建时间：

2025-07-24

5,000+

优质数据集

54 个

任务类型

进入经典数据集