Voxel51/consolidated_receipt_dataset
收藏Hugging Face2025-10-21 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/Voxel51/consolidated_receipt_dataset
下载链接
链接失效反馈官方服务:
资源简介:
这是一个大规模的印尼收据数据集,旨在用于后OCR解析任务,特别是针对收据理解。该数据集包含超过11,000张来自商店和餐馆的印尼收据图像,带有OCR注释(边界框和文本)和多级语义标签以进行解析。FiftyOne的实现提供了一个易于探索的800个带注释收据图像的训练分割的界面。数据集弥合了OCR和NLP任务之间的差距,通过提供视觉和语义注释,使其适合端到端的文档智能系统。每个收据都包括具有30个语义类别的详细注释,这些类别分为5个超类(菜单、无效菜单、小计、无效总计和总计),以及包括行分组、感兴趣区域和键值对标记的元数据。
CORD (Consolidated Receipt Dataset) is a large-scale dataset designed for post-OCR parsing tasks, specifically focused on receipt understanding. The dataset contains over 11,000 Indonesian receipts collected from shops and restaurants, featuring images with OCR annotations (bounding boxes and text) and multi-level semantic labels for parsing. This FiftyOne implementation provides an accessible interface for exploring the training split with 800 annotated receipt images.
提供机构:
Voxel51



