Kratos-AI/Korean-Documents-Dataset
收藏Hugging Face2025-10-13 更新2025-10-18 收录
下载链接:
https://hf-mirror.com/datasets/Kratos-AI/Korean-Documents-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
这个数据集是一个经过精心策划的韩文文档集合,包括扫描和数字化的印刷材料、表格、信件、通知以及手写便条。数据集已经过清理和匿名化,适合用于AI训练、OCR基准测试和自然语言理解研究。它支持光学字符识别、文档布局和结构分析、表格解析和实体提取、韩文官方或行政文本的语言建模、多模态文档理解以及韩英翻译基准测试等任务。数据集以韩语为主要语言,同时也包含有限的英文内容。
This dataset is a curated collection of scanned and digital Korean documents, including printed materials, forms, letters, notices, and handwritten notes. It has been cleaned and anonymized for use in AI training, OCR benchmarking, and natural language understanding research. It supports tasks such as optical character recognition, document layout and structure analysis, form parsing and entity extraction, language modeling for Korean official or administrative text, multimodal document understanding, and translation benchmarking between Korean and English. The primary language of the dataset is Korean, with limited English content present in bilingual official documents.
提供机构:
Kratos-AI



