ethanbradley/synfintabs
收藏Hugging Face2024-12-06 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/ethanbradley/synfintabs
下载链接
链接失效反馈官方服务:
资源简介:
SynFinTabs是一个包含合成财务表格的数据集,用于信息提取和表格提取任务。数据集包含图像、ID、主题、边界框、行、单元格、文本、标签、分割、OCR结果、问题ID、问题、答案、答案键、答案跨度、表示形式等特性。数据集分为训练集、验证集和测试集,分别包含79998、10002和10000个样本。表格注释以行、单元格和单词的列表形式存储,并包含它们在表格图像中的二维位置信息。每个表格图像的大小为A4页面,可以通过裁剪边界框来获取表格。使用该数据集时,请引用相关文章和数据集本身。
SynFinTabs is a dataset of synthetic financial tables for information extraction and table extraction tasks. The dataset includes features such as images, IDs, themes, bounding boxes, rows, cells, text, labels, splits, OCR results, question IDs, questions, answers, answer keys, answer spans, and representations. The dataset is divided into training, validation, and test sets, containing 79998, 10002, and 10000 samples respectively. Table annotations are stored as lists of rows, cells, and words, along with their 2D positional information within the table image. Each table image is the size of an A4 page and can be cropped using the bounding box. When using this dataset, please cite both the article and the dataset itself.
提供机构:
ethanbradley



