ajdajd/data-snapshot
收藏Hugging Face2026-04-24 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/ajdajd/data-snapshot
下载链接
链接失效反馈官方服务:
资源简介:
`data-snapshot`数据集是一个标注语料库,旨在评估和开发从PDF文档中提取数据快照的模型。数据快照被定义为包含来自统计、指标或结构化数据源的定量数据的图表或表格。数据集结构包括注释文件、原始PDF、文档级元数据和模式文件。注释文件遵循Data Snapshot Evaluation Format (v1.3),并包含数据快照的对象类别(图表/表格)和边界框位置(归一化的[x1, y1, x2, y2]格式,左上角为原点)。数据集创建是通过使用Label Studio进行人工标注完成的。
The `data-snapshot` dataset is an annotated corpus designed for the evaluation and development of models for extracting *data snapshots* from PDF documents. A **data snapshot** is defined as a figure or table that contains quantitative data derived from statistics, indicators, or structured data sources. The dataset structure includes annotation files, raw PDFs, document-level metadata, and schema files. The annotation files follow the Data Snapshot Evaluation Format (v1.3) and indicate the data snapshots: their object class (Figure / Table) and bounding box locations (in normalized `[x1, y1, x2, y2]` format, top-left origin). The dataset was created through human labeling using Label Studio.
提供机构:
ajdajd



