five

wkrl/cord

收藏
Hugging Face2022-07-09 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/wkrl/cord
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - crowdsourced language_creators: - crowdsourced language: - en multilinguality: - monolingual license: - cc-by-4.0 pretty_name: CORD size_categories: - 1K<n<10K source_datasets: - original task_categories: - token-classification task_ids: - parsing --- # Dataset Card for CORD (Consolidated Receipt Dataset) ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Additional Information](#additional-information) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Repository: https://github.com/clovaai/cord** - **Paper: https://openreview.net/pdf?id=SJl3z659UH** - **Leaderboard: https://paperswithcode.com/dataset/cord** ### Dataset Summary [More Information Needed] ### Supported Tasks and Leaderboards [More Information Needed] ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields ```python { "id": datasets.Value("string"), "words": datasets.Sequence(datasets.Value("string")), "bboxes": datasets.Sequence(datasets.Sequence(datasets.Value("int64"))), "labels": datasets.Sequence(datasets.features.ClassLabel(names=_LABELS)), "images": datasets.features.Image(), } ``` ### Data Splits - train (800 rows) - validation (100 rows) - test (100 rows) ## Dataset Creation ### Licensing Information [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/) ### Citation Information ``` @article{park2019cord, title={CORD: A Consolidated Receipt Dataset for Post-OCR Parsing}, author={Park, Seunghyun and Shin, Seung and Lee, Bado and Lee, Junyeop and Surh, Jaeheung and Seo, Minjoon and Lee, Hwalsuk} booktitle={Document Intelligence Workshop at Neural Information Processing Systems} year={2019} } ``` ### Contributions Thanks to [@clovaai](https://github.com/clovaai) for adding this dataset.
提供机构:
wkrl
原始信息汇总

数据集概述

基本信息

  • 数据集名称: CORD (Consolidated Receipt Dataset)
  • 语言: 英语 (en)
  • 多语言性: 单语
  • 许可证: CC-BY-4.0
  • 数据集大小: 1K<n<10K
  • 数据来源: 原始数据
  • 任务类别: 词元分类
  • 任务ID: 解析

数据集结构

数据实例

  • 训练集: 800行
  • 验证集: 100行
  • 测试集: 100行

数据字段

  • id: 字符串类型
  • words: 字符串序列
  • bboxes: 整数序列序列
  • labels: 类别标签序列
  • images: 图像类型

数据集创建

许可证信息

  • 许可证: Creative Commons Attribution 4.0 International License

引用信息

@article{park2019cord, title={CORD: A Consolidated Receipt Dataset for Post-OCR Parsing}, author={Park, Seunghyun and Shin, Seung and Lee, Bado and Lee, Junyeop and Surh, Jaeheung and Seo, Minjoon and Lee, Hwalsuk} booktitle={Document Intelligence Workshop at Neural Information Processing Systems} year={2019} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作