Yesianrohn/UnionST
收藏Hugging Face2026-03-06 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Yesianrohn/UnionST
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- image-to-text
language:
- en
tags:
- OCR
- STR
- scene text
- synthetic data
size_categories:
- 10M<n<100M
---
# UnionST: A Strong Synthetic Engine for Scene Text Recognition
<a href='https://arxiv.org/abs/2602.06450'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
<a href='https://huggingface.co/Yesianrohn/UnionST-Models'><img src='https://img.shields.io/badge/Ckpt-Huggingface-yellow'></a>
Official data of the paper *"What’s Wrong with Synthetic Data for Scene Text Recognition? A Strong Synthetic Engine with Diverse Simulations and Self-Evolution"*.
## Introduction
Scene Text Recognition (STR) relies critically on large-scale, high-quality training data. While synthetic data provides a cost-effective alternative to manually annotated real data, existing rendering-based synthetic datasets suffer from **insufficient diversity** (corpus/font/layout) and a large domain gap with real-world text.
### Key Advantages
- 🎯 **100% Label Correctness**: Rendering-based paradigm ensures accurate labels (unlike generative models with aesthetic but error-prone outputs).
- ⚡ **Cost-Efficiency**: CPU-based generation costs only 1/20 of diffusion-based methods and 1/10,000 of closed-source alternatives.
- 🚀 **Strong Performance**: UnionST-S (5M samples) outperforms 36M-scale traditional synthetic datasets on challenging STR benchmarks.
## Dataset
UnionST-S, UnionST-P, and UnionST-R datasets are here. We use the lmdb file format adopted by the mainstream STR protocol. In addition, we have summarized the other STR synthetic datasets compared in the paper, which are available [here](https://huggingface.co/datasets/Yesianrohn/STR-Synth).
## Training Model
The configuration and implementation of the SVTRv2-AR model have been completed in [OpenOCR](https://github.com/Topdu/OpenOCR/blob/main/configs/rec/nrtr/svtrv2_nrtr.yml).
```bash
cd OpenOCR
torchrun --nproc_per_node=8 tools/train_rec.py --c configs/rec/nrtr/svtrv2_nrtr_unionst.yml
```
Some of our trained models can be found at [Huggingface](https://huggingface.co/Yesianrohn/UnionST-Models).
## Citation
```bash
@inproceedings{ye2026wrong,
title={What's Wrong with Synthetic Data for Scene Text Recognition? A Strong Synthetic Engine with Diverse Simulations and Self-Evolution},
author={Ye, Xingsong and Du, Yongkun and Zhang, JiaXin and Li, Chen and LYU, Jing and Chen, Zhineng},
booktitle={CVPR},
year={2026}
}
```
## License
```bash
"""
UnionST
Copyright (c) 2025-present YesianRohn
Based on SynthTIGER
Copyright (c) 2021-present NAVER Corp.
MIT License
"""
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
```
## Acknowledgements
- We thank the [SynthText](https://github.com/ankush-me/SynthText), [SynthTIGER](https://github.com/clovaai/synthtiger), [SVTRv2](https://github.com/Topdu/OpenOCR/blob/main/docs/svtrv2.md) and [Union14M](https://github.com/Mountchicken/Union14M) for their open-source code/datasets.
- Special thanks also go to the training framework: [OpenOCR](https://github.com/Topdu/OpenOCR).
license: MIT协议
task_categories:
- 图像转文本
language:
- 英语
tags:
- 光学字符识别(OCR)
- 场景文本识别(STR)
- 场景文本
- 合成数据
size_categories:
- 10M<n<100M
# UnionST:面向场景文本识别的高性能合成数据生成引擎
<a href='https://arxiv.org/abs/2602.06450'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
<a href='https://huggingface.co/Yesianrohn/UnionST-Models'><img src='https://img.shields.io/badge/Ckpt-Huggingface-yellow'></a>
本仓库为论文《场景文本识别的合成数据存在哪些问题?一款兼具多样化模拟与自我进化能力的高性能合成数据引擎》的官方配套数据集。
## 简介
场景文本识别(STR)高度依赖大规模高质量的训练数据。合成数据为人工标注的真实数据提供了一种高性价比的替代方案,但现有的基于渲染的合成数据集存在**多样性不足**(语料库、字体、布局)的问题,且与真实场景文本存在较大的域间隙。
### 核心优势
- 🎯 **100%标签准确率**:基于渲染的生成范式可确保标签完全准确(不同于生成式模型虽美观但易出错的输出结果)。
- ⚡ **高性价比**:基于CPU的生成成本仅为基于扩散模型方法的1/20,闭源替代方案的1/10000。
- 🚀 **优异性能**:UnionST-S(含500万样本)在具有挑战性的STR基准测试中,性能优于规模达3600万的传统合成数据集。
## 数据集
本仓库包含UnionST-S、UnionST-P与UnionST-R三款数据集。我们采用了主流STR协议所使用的LMDB文件格式。此外,我们还汇总了论文中对比的其他STR合成数据集,可从[此处](https://huggingface.co/datasets/Yesianrohn/STR-Synth)获取。
## 训练模型
SVTRv2-AR模型的配置与实现已在[OpenOCR](https://github.com/Topdu/OpenOCR/blob/main/configs/rec/nrtr/svtrv2_nrtr.yml)中完成。
bash
cd OpenOCR
torchrun --nproc_per_node=8 tools/train_rec.py --c configs/rec/nrtr/svtrv2_nrtr_unionst.yml
我们的部分训练好的模型可在[Huggingface](https://huggingface.co/Yesianrohn/UnionST-Models)获取。
## 引用格式
bash
@inproceedings{ye2026wrong,
title={What's Wrong with Synthetic Data for Scene Text Recognition? A Strong Synthetic Engine with Diverse Simulations and Self-Evolution},
author={Ye, Xingsong and Du, Yongkun and Zhang, JiaXin and Li, Chen and LYU, Jing and Chen, Zhineng},
booktitle={CVPR},
year={2026}
}
## 许可证
bash
"""
UnionST
Copyright (c) 2025-present YesianRohn
Based on SynthTIGER
Copyright (c) 2021-present NAVER Corp.
MIT License
"""
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files ("Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
## 致谢
- 感谢[SynthText](https://github.com/ankush-me/SynthText)、[SynthTIGER](https://github.com/clovaai/synthtiger)、[SVTRv2](https://github.com/Topdu/OpenOCR/blob/main/docs/svtrv2.md)与[Union14M](https://github.com/Mountchicken/Union14M)开源其代码与数据集。
- 特别感谢训练框架[OpenOCR](https://github.com/Topdu/OpenOCR)。
提供机构:
Yesianrohn



