five

[SAMPLE] Nexdata | OCR Data | 500,000 Images| Computer Vision Data| Invoice Data| AI Training Data

收藏
Databricks2024-05-31 收录
下载链接:
https://marketplace.databricks.com/details/95d73eef-64ee-4dc6-930e-8336157d1fab/Nexdata_SAMPLE-Nexdata-OCR-Data-500,000 Images-Computer-Vision-Data-Invoice-Data-AI-Training-Data
下载链接
链接失效反馈
官方服务:
资源简介:
1. Specifications Data size : 500,000 images Collecting environment : including shop plaque, stop board, poster, ticket, road sign, comic, cover picture, prompt/reminder, warning, packing instruction, menu, building sign, etc. Diversity : including 20 languages, multiple natural scenes, multiple photographic angles (looking up angle, looking down angle, eye-level angle) Device : cellphone, camera Image parameter : the image data format is .jpg, and the annotation file data format is .json Annotation content : line-level quadrilateral bounding box annotation and transcription for the texts Accuracy : the error bound of each vertex of quadrilateral bounding box is within 5 pixels, which is a qualified annotation, the accuracy of bounding boxes is not less than 97%; the texts transcription accuracy is not less than 97% 2. About Nexdata Nexdata owns off-the-shelf 200,000 hours of speech recognition data, 800TB of Annotated Imagery Data, about 2 billion pieces of Natural Language Processing (NLP) Data. These ready-to-go AI & ML Training Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/ocr?source=Datarade

1. 规格参数 数据规模:50万张图像 采集场景:涵盖店铺招牌、停车标识、海报、票据、道路路标、漫画、封面图、提示/警示语、包装说明、菜单、楼宇标识等各类场景 数据多样性:覆盖20种语言,包含多种自然场景,支持多种拍摄角度(仰拍、俯拍、平视拍摄) 采集设备:手机、相机 图像参数:图像数据格式为.jpg,标注文件数据格式为.json 标注内容:针对文本的行级四边形边界框标注与转录 标注精度:四边形边界框的每个顶点误差范围不超过5像素即为合格标注,边界框标注准确率不低于97%;文本转录准确率不低于97% 2. 关于Nexdata Nexdata 拥有现成可用的20万小时语音识别数据、800TB标注图像数据,以及约20亿条自然语言处理(Natural Language Processing, NLP)数据。此类即开即用的人工智能与机器学习训练数据支持即时交付,可快速提升AI模型的准确率。如需了解更多详情,请访问:https://www.nexdata.ai/datasets/ocr?source=Datarade
提供机构:
Nexdata
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作