naver-clova-ix/synthdog-ko

Name: naver-clova-ix/synthdog-ko
Creator: naver-clova-ix
Published: 2024-01-31 23:55:41
License: 暂无描述

Hugging Face2024-01-31 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/naver-clova-ix/synthdog-ko

下载链接

链接失效反馈

官方服务：

资源简介：

## Donut 🍩 : OCR-Free Document Understanding Transformer (ECCV 2022) -- SynthDoG datasets For more information, please visit https://github.com/clovaai/donut ![image](https://github.com/clovaai/donut/blob/master/misc/sample_synthdog.png?raw=true) The links to the SynthDoG-generated datasets are here: - [`synthdog-en`](https://huggingface.co/datasets/naver-clova-ix/synthdog-en): English, 0.5M. - [`synthdog-zh`](https://huggingface.co/datasets/naver-clova-ix/synthdog-zh): Chinese, 0.5M. - [`synthdog-ja`](https://huggingface.co/datasets/naver-clova-ix/synthdog-ja): Japanese, 0.5M. - [`synthdog-ko`](https://huggingface.co/datasets/naver-clova-ix/synthdog-ko): Korean, 0.5M. To generate synthetic datasets with our SynthDoG, please see `./synthdog/README.md` and [our paper](#how-to-cite) for details. ## How to Cite If you find this work useful to you, please cite: ```bibtex @inproceedings{kim2022donut, title = {OCR-Free Document Understanding Transformer}, author = {Kim, Geewook and Hong, Teakgyu and Yim, Moonbin and Nam, JeongYeon and Park, Jinyoung and Yim, Jinyeong and Hwang, Wonseok and Yun, Sangdoo and Han, Dongyoon and Park, Seunghyun}, booktitle = {European Conference on Computer Vision (ECCV)}, year = {2022} } ```

Donut 🍩：无光学字符识别（Optical Character Recognition，OCR）的文档理解Transformer（ECCV 2022）—— SynthDoG数据集更多信息请访问 https://github.com/clovaai/donut ![image](https://github.com/clovaai/donut/blob/master/misc/sample_synthdog.png?raw=true) SynthDoG生成的数据集链接如下： - [`synthdog-en`](https://huggingface.co/datasets/naver-clova-ix/synthdog-en)：英语语料，共计50万条样本 - [`synthdog-zh`](https://huggingface.co/datasets/naver-clova-ix/synthdog-zh)：汉语语料，共计50万条样本 - [`synthdog-ja`](https://huggingface.co/datasets/naver-clova-ix/synthdog-ja)：日语语料，共计50万条样本 - [`synthdog-ko`](https://huggingface.co/datasets/naver-clova-ix/synthdog-ko)：韩语语料，共计50万条样本若需使用SynthDoG生成合成数据集，请详见`./synthdog/README.md`及[我们的论文](#how-to-cite)获取详细信息。 ## 引用方式若您的研究工作受益于本项目，请引用如下文献： bibtex @inproceedings{kim2022donut, title = {OCR-Free Document Understanding Transformer}, author = {Kim, Geewook and Hong, Teakgyu and Yim, Moonbin and Nam, JeongYeon and Park, Jinyoung and Yim, Jinyeong and Hwang, Wonseok and Yun, Sangdoo and Han, Dongyoon and Park, Seunghyun}, booktitle = {European Conference on Computer Vision (ECCV)}, year = {2022} }

提供机构：

naver-clova-ix

原始信息汇总