Timokerr/OCR_baseline
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Timokerr/OCR_baseline
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个用于基准测试的文档数据集,主要用于评估17种LLM模型。数据集包含三个领域的文档:发票(Invoices)、收据(Receipts)和物流(Logistics)。每个领域文件夹包含源PDF文件和对应的JSON格式的ground truth标签。JSON文件采用字段对象模式(value, critical, type, 可选元数据),并允许嵌套字段。
This folder contains the canonical benchmark dataset used by the standalone runner. The dataset includes documents from three domains: Invoices, Receipts, and Logistics. Each domain folder contains source PDFs and corresponding ground truth labels in JSON format. The JSON files use a field-object schema (value, critical, type, optional metadata) and allow nested fields.
提供机构:
Timokerr



