naver-clova-ix/synthdog-ko
收藏Hugging Face2024-01-31 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/naver-clova-ix/synthdog-ko
下载链接
链接失效反馈官方服务:
资源简介:
## Donut 🍩 : OCR-Free Document Understanding Transformer (ECCV 2022) -- SynthDoG datasets
For more information, please visit https://github.com/clovaai/donut

The links to the SynthDoG-generated datasets are here:
- [`synthdog-en`](https://huggingface.co/datasets/naver-clova-ix/synthdog-en): English, 0.5M.
- [`synthdog-zh`](https://huggingface.co/datasets/naver-clova-ix/synthdog-zh): Chinese, 0.5M.
- [`synthdog-ja`](https://huggingface.co/datasets/naver-clova-ix/synthdog-ja): Japanese, 0.5M.
- [`synthdog-ko`](https://huggingface.co/datasets/naver-clova-ix/synthdog-ko): Korean, 0.5M.
To generate synthetic datasets with our SynthDoG, please see `./synthdog/README.md` and [our paper](#how-to-cite) for details.
## How to Cite
If you find this work useful to you, please cite:
```bibtex
@inproceedings{kim2022donut,
title = {OCR-Free Document Understanding Transformer},
author = {Kim, Geewook and Hong, Teakgyu and Yim, Moonbin and Nam, JeongYeon and Park, Jinyoung and Yim, Jinyeong and Hwang, Wonseok and Yun, Sangdoo and Han, Dongyoon and Park, Seunghyun},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2022}
}
```
Donut 🍩:无光学字符识别(Optical Character Recognition,OCR)的文档理解Transformer(ECCV 2022)—— SynthDoG数据集
更多信息请访问 https://github.com/clovaai/donut

SynthDoG生成的数据集链接如下:
- [`synthdog-en`](https://huggingface.co/datasets/naver-clova-ix/synthdog-en):英语语料,共计50万条样本
- [`synthdog-zh`](https://huggingface.co/datasets/naver-clova-ix/synthdog-zh):汉语语料,共计50万条样本
- [`synthdog-ja`](https://huggingface.co/datasets/naver-clova-ix/synthdog-ja):日语语料,共计50万条样本
- [`synthdog-ko`](https://huggingface.co/datasets/naver-clova-ix/synthdog-ko):韩语语料,共计50万条样本
若需使用SynthDoG生成合成数据集,请详见`./synthdog/README.md`及[我们的论文](#how-to-cite)获取详细信息。
## 引用方式
若您的研究工作受益于本项目,请引用如下文献:
bibtex
@inproceedings{kim2022donut,
title = {OCR-Free Document Understanding Transformer},
author = {Kim, Geewook and Hong, Teakgyu and Yim, Moonbin and Nam, JeongYeon and Park, Jinyoung and Yim, Jinyeong and Hwang, Wonseok and Yun, Sangdoo and Han, Dongyoon and Park, Seunghyun},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2022}
}
提供机构:
naver-clova-ix
原始信息汇总
数据集概述
数据集名称
- SynthDoG datasets
数据集描述
- 该数据集是为Donut模型(一种OCR-Free文档理解Transformer)开发的,用于ECCV 2022会议。
数据集链接
synthdog-en: 英语,数据量0.5M。synthdog-zh: 中文,数据量0.5M。synthdog-ja: 日语,数据量0.5M。synthdog-ko: 韩语,数据量0.5M。
数据集生成
- 可通过查看
./synthdog/README.md文件和相关论文了解如何使用SynthDoG生成合成数据集。
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



