five

The Uber Text dataset

收藏
OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/The_Uber_Text_dataset
下载链接
链接失效反馈
官方服务:
资源简介:
近年来,由于深度学习社区的复兴,光学字符识别 (OCR) 方法得到了广泛的发展。最先进的模型主要是在由受约束的场景组成的数据集上训练的,这些场景涉及人类注释者的大量处理。从真实世界图像中检测和识别文本仍然是一项技术挑战。在本文中,我们介绍了一个大型OCR数据集uber-text,其中包含从车载传感器收集的街道级图像以及由图像分析师团队注释的真相。数据集的特征包括 (1) 带有文本区域多边形和相应转录的街边图像,(2) 指示企业名称文本,街道名称文本和街道编号文本等的9个类别,(3) 包含超过110k个图像的集合,(4) 平均每个图像4.84个文本实例。我们通过评估两种最近提出的对象检测方法来展示任务和数据集的挑战,这证明了数据集的重要性并激发了该研究领域的未来工作。此外,我们提出了一种端到端的文本序列识别方法,该方法无需词典,并且不需要字符级的预训练阶段。

In recent years, optical character recognition (OCR) methods have undergone extensive development driven by the resurgence of the deep learning community. State-of-the-art models are predominantly trained on datasets consisting of constrained scenarios, which require substantial processing by human annotators. Nonetheless, detecting and recognizing text from real-world images remains a formidable technical challenge. In this paper, we introduce a large-scale OCR dataset, uber-text, which comprises street-level images collected from on-board sensors and ground-truth annotations completed by a team of image analysts. The dataset features the following aspects: (1) street-level images paired with polygons of text regions and their corresponding transcriptions; (2) nine categories including business name text, street name text, street number text and other similar text types; (3) a collection of over 110,000 images; (4) an average of 4.84 text instances per image. We evaluate two recently proposed object detection methods to demonstrate the challenges of this task and dataset, which validates the importance of the dataset and inspires future research in this field. Additionally, we propose an end-to-end text sequence recognition method that requires neither dictionaries nor a character-level pre-training stage.
提供机构:
OpenDataLab
创建时间:
2023-10-20
搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作