sdcfsdsfsdds/synthdog-en

Name: sdcfsdsfsdds/synthdog-en
Creator: sdcfsdsfsdds
Published: 2025-12-06 07:20:26
License: 暂无描述

Hugging Face2025-12-06 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/sdcfsdsfsdds/synthdog-en

下载链接

链接失效反馈

官方服务：

资源简介：

## Donut 🍩 : OCR-Free Document Understanding Transformer (ECCV 2022) -- SynthDoG datasets For more information, please visit https://github.com/clovaai/donut ![image](https://github.com/clovaai/donut/blob/master/misc/sample_synthdog.png?raw=true) The links to the SynthDoG-generated datasets are here: - [`synthdog-en`](https://huggingface.co/datasets/naver-clova-ix/synthdog-en): English, 0.5M. - [`synthdog-zh`](https://huggingface.co/datasets/naver-clova-ix/synthdog-zh): Chinese, 0.5M. - [`synthdog-ja`](https://huggingface.co/datasets/naver-clova-ix/synthdog-ja): Japanese, 0.5M. - [`synthdog-ko`](https://huggingface.co/datasets/naver-clova-ix/synthdog-ko): Korean, 0.5M. To generate synthetic datasets with our SynthDoG, please see `./synthdog/README.md` and [our paper](#how-to-cite) for details. ## How to Cite If you find this work useful to you, please cite: ```bibtex @inproceedings{kim2022donut, title = {OCR-Free Document Understanding Transformer}, author = {Kim, Geewook and Hong, Teakgyu and Yim, Moonbin and Nam, JeongYeon and Park, Jinyoung and Yim, Jinyeong and Hwang, Wonseok and Yun, Sangdoo and Han, Dongyoon and Park, Seunghyun}, booktitle = {European Conference on Computer Vision (ECCV)}, year = {2022} } ```

# Donut 🍩：无光学字符识别文档理解Transformer（OCR-Free Document Understanding Transformer，ECCV 2022）—— SynthDoG数据集如需获取更多信息，请访问：https://github.com/clovaai/donut ![image](https://github.com/clovaai/donut/blob/master/misc/sample_synthdog.png?raw=true) SynthDoG生成数据集的下载链接如下： - [`synthdog-en`](https://huggingface.co/datasets/naver-clova-ix/synthdog-en)：英语数据集，共50万条样本 - [`synthdog-zh`](https://huggingface.co/datasets/naver-clova-ix/synthdog-zh)：中文数据集，共50万条样本 - [`synthdog-ja`](https://huggingface.co/datasets/naver-clova-ix/synthdog-ja)：日语数据集，共50万条样本 - [`synthdog-ko`](https://huggingface.co/datasets/naver-clova-ix/synthdog-ko)：韩语数据集，共50万条样本若需使用SynthDoG生成合成数据集，请参阅`./synthdog/README.md`文件以及[我们的论文](#how-to-cite)获取详细说明。 ## 引用方式若您认为本研究对您的工作有所帮助，请引用如下文献： bibtex @inproceedings{kim2022donut, title = {无光学字符识别文档理解Transformer（OCR-Free Document Understanding Transformer）}, author = {Kim, Geewook and Hong, Teakgyu and Yim, Moonbin and Nam, JeongYeon and Park, Jinyoung and Yim, Jinyeong and Hwang, Wonseok and Yun, Sangdoo and Han, Dongyoon and Park, Seunghyun}, booktitle = {欧洲计算机视觉大会（ECCV）}, year = {2022} }

提供机构：

sdcfsdsfsdds

5,000+

优质数据集

54 个

任务类型

进入经典数据集