five

UHaT Dataset: Urdu Handwritten Text Dataset

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/3670610
下载链接
链接失效反馈
官方服务:
资源简介:
UHaT Dataset UHaT: Urdu Handwritten Text Dataset This dataset contains handwritten characters and digits of Urdu language. The samples are written by 900+ individuals. Description and organization Size of images: All the images are stored in 28 by 28 resolution. How many images: The training set per each character contains of 700 images on average. For example, there are 811 train set images for AYN and 697 train set images for ALIF. Similarly, the train set per each contains 700 images on average. For example, there are 678 train set images for digits one. The test set per each character contains 140 images on average. For example, there are 145 test set images for character ALIF. The test set per each digit contains 140 images on average. For example, there are 147 test set images for digit nine. The dataset is organized into four sub-directories. Characters Training set, Characters Test set, Digits training set and digits test set. Each sub-director contains one sub-folder per one character. For example, all the train images for character ALIF are placed in sub-folder Alif. The folder hierarchy is given as: *Data > characters train set > alif Data > characters train set > ayn* And so on…. How to load directly? You can also load it directly from the uhat_dataset.npz file. See the kernel load_dataset Acknowledgements Thanks to all volunteers who contributed by providing handwriting samples. Inspiration This is an MNIST style dataset. The machine learning community in general will find it useful for experimentation, demonstration purposes of machine learning models. The dataset will also provide an opportunity to researchers to work on Urdu text recognition.
创建时间:
2020-02-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作