five

多样式网络图像文字检测与识别数据集MTWI

收藏
阿里云天池2026-06-09 更新2024-03-07 收录
下载链接:
https://tianchi.aliyun.com/dataset/137084
下载链接
链接失效反馈
官方服务:
资源简介:
多样式网络图像文字检测与识别数据集MTWI (Multi-Type Web Images)是业界首个基于网络图片的、以中文为主的OCR数据集,由阿里巴巴“图像和美”团队联合华南理工大学联合提供,并依托ICPR'2018会议举办了学术评测竞赛。 MTWI数据集数据量充分,涵盖数十种字体,几个到几百像素字号,多种版式,较多干扰背景。研究人员可基于本数据集开展图片管控,搜索,信息录入等AI领域的研究工作。

The Multi-Type Web Images (MTWI) dataset is the first industry-focused Chinese-dominant OCR dataset based on real-world web images. It was co-developed and released by Alibaba Group's "Image & Beauty" team and South China University of Technology, and hosted an academic evaluation competition in partnership with the ICPR 2018 conference. The MTWI dataset boasts a substantial volume of data, covering dozens of font types, font sizes ranging from a few pixels to hundreds of pixels, diverse layouts, and numerous cluttered backgrounds. Researchers can utilize this dataset to conduct AI-related research in areas such as image content moderation, web image search, information entry and other relevant fields.
提供机构:
阿里云天池
创建时间:
2022-09-08
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
MTWI数据集是一个以中文为主的网络图像OCR数据集,包含20,000张图像,涵盖多种字体和版式,适用于文本检测与识别研究。数据集提供三个任务,支持学术界和工业界在OCR领域的深入探索。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务