upstage/dp-bench
收藏Hugging Face2026-04-22 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/upstage/dp-bench
下载链接
链接失效反馈官方服务:
资源简介:
DP-Bench是一个文档解析基准数据集,旨在评估文档解析器的性能。它包含了从图书馆、开放教育资源和内部文档中收集的多种类型的文档样本。数据集将文档划分为12种布局元素类型,采用JSON格式存储,并提供了用于评估解析器性能的三个关键指标:NID(归一化插入删除距离)、TEDS(基于树编辑距离的相似性)和TEDS-S(基于树编辑距离的结构相似性)。
DP-Bench is a document parsing benchmark dataset designed to evaluate the performance of document parsers. It consists of various document samples collected from libraries, open educational resources, and internal documents. The dataset divides documents into 12 layout element types and is stored in JSON format. It provides three key metrics for evaluating parser performance: NID (Normalized Indel Distance), TEDS (Tree Edit Distance-based Similarity), and TEDS-S (Tree Edit Distance-based Similarity-Struct).
提供机构:
upstage



