ThaiOCRBench
收藏魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/scb10x/ThaiOCRBench
下载链接
链接失效反馈官方服务:
资源简介:
# ThaiOCRBench: A Task-Diverse Benchmark for Vision-Language Understanding in Thai
**ThaiOCRBench** is the first comprehensive benchmark for evaluating vision-language models (VLMs) on Thai text-rich visual understanding tasks.
Inspired by OCRBench v2, it contains **2,808 human-annotated samples** across **13 diverse tasks**, including table parsing, chart understanding, full-page OCR, key information extraction, and visual question answering.
The benchmark enables **standardized zero-shot evaluation** for both proprietary and open-source models, revealing significant performance gaps and paving the way for document understanding in low-resource languages.
🚀 **Our paper _ThaiOCRBench_ has been accepted to the IJCNLP-AACL 2025 Main Conference!**
👉 **[📄 Read the Paper](https://arxiv.org/abs/2511.04479)**
👉 **[💻 GitHub Repository](https://github.com/scb-10x/ThaiOCRBench)**
## 📊 Dataset Statistics
| Task Type | Number of Samples |
|-------------------------------|-------------------|
| Text Recognition | 333 |
| Table Parsing | 193 |
| Full-page OCR | 197 |
| Chart Parsing | 200 |
| Key Information Extraction | 201 |
| Diagram VQA | 204 |
| Fine-grained Text Recognition | 206 |
| Handwritten Content Extraction| 209 |
| Key Information Mapping | 209 |
| Document Parsing | 211 |
| Infographics VQA | 213 |
| Document Classification | 215 |
| Cognition VQA | 217 |
| **Total** | **2,808** |
## 🧠 Performance of VLMs on ThaiOCRBench
<p align="center">
<img src="https://raw.githubusercontent.com/scb-10x/ThaiOCRBench/main/pics/thaiocrbench_eval.png" width="70%" height="60%">
</p>
## 📘 Citation
If you use ThaiOCRBench in your research or applications, please cite our work:
```
@misc{nonesung2025thaiocrbenchtaskdiversebenchmarkvisionlanguage,
title={ThaiOCRBench: A Task-Diverse Benchmark for Vision-Language Understanding in Thai},
author={Surapon Nonesung and Teetouch Jaknamon and Sirinya Chaiophat and Natapong Nitarach and Chanakan Wittayasakpan and Warit Sirichotedumrong and Adisai Na-Thalang and Kunat Pipatanakul},
year={2025},
eprint={2511.04479},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2511.04479},
}
```
# ThaiOCRBench:面向泰语视觉语言理解的多任务基准数据集
**泰语OCR基准数据集(ThaiOCRBench)**是首个用于评估泰语富文本视觉理解任务的视觉语言模型(Vision-Language Models, VLMs)基准数据集。
该基准受OCRBench v2启发,涵盖13类多样化任务,包含**2808个人工标注样本**,覆盖表格解析、图表理解、全页OCR、关键信息提取以及视觉问答等任务。
该基准支持针对闭源与开源模型的标准化零样本(Zero-shot)评估,可揭示显著的性能差距,为低资源语言的文档理解研究铺平道路。
🚀 **我们的论文《面向泰语视觉语言理解的多任务基准数据集(ThaiOCRBench)》已被IJCNLP-AACL 2025主会议收录!**
👉 **[📄 阅读论文](https://arxiv.org/abs/2511.04479)**
👉 **[💻 GitHub 仓库](https://github.com/scb-10x/ThaiOCRBench)**
## 📊 数据集统计
| 任务类型 | 样本数量 |
|-------------------------------|----------|
| 文本识别 | 333 |
| 表格解析 | 193 |
| 全页OCR | 197 |
| 图表解析 | 200 |
| 关键信息提取 | 201 |
| 图表视觉问答 | 204 |
| 细粒度文本识别 | 206 |
| 手写内容提取 | 209 |
| 关键信息映射 | 209 |
| 文档解析 | 211 |
| 信息图表视觉问答 | 213 |
| 文档分类 | 215 |
| 认知视觉问答 | 217 |
| **总计** | **2808** |
## 🧠 基于ThaiOCRBench的视觉语言模型性能评估
<p align="center">
<img src="https://raw.githubusercontent.com/scb-10x/ThaiOCRBench/main/pics/thaiocrbench_eval.png" width="70%" height="60%">
</p>
## 📘 引用
如果您在研究或应用中使用ThaiOCRBench,请引用我们的工作:
@misc{nonesung2025thaiocrbenchtaskdiversebenchmarkvisionlanguage,
title={ThaiOCRBench: A Task-Diverse Benchmark for Vision-Language Understanding in Thai},
author={Surapon Nonesung and Teetouch Jaknamon and Sirinya Chaiophat and Natapong Nitarach and Chanakan Wittayasakpan and Warit Sirichotedumrong and Adisai Na-Thalang and Kunat Pipatanakul},
year={2025},
eprint={2511.04479},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2511.04479},
}
提供机构:
maas
创建时间:
2025-12-02



