five

Arabic Handwritten Characters Dataset

收藏
DataCite Commons2025-06-01 更新2024-07-28 收录
下载链接:
https://figshare.com/articles/Arabic_Handwritten_Characters_Dataset/12236960/1
下载链接
链接失效反馈
官方服务:
资源简介:
Arabic Handwritten Characters DatasetAstractHandwritten Arabic character recognition systems face several challenges, including the unlimited variation in human handwriting and large public databases. In this work, we model a deep learning architecture that can be effectively apply to recognizing Arabic handwritten characters. A Convolutional Neural Network (CNN) is a special type of feed-forward multilayer trained in supervised mode. The CNN trained and tested our database that contain 16800 of handwritten Arabic characters. In this paper, the optimization methods implemented to increase the performance of CNN. Common machine learning methods usually apply a combination of feature extractor and trainable classifier. The use of CNN leads to significant improvements across different machine-learning classification algorithms. Our proposed CNN is giving an average 5.1% misclassification error on testing data.ContextThe motivation of this study is to use cross knowledge learned from multiple works to enhancement the performance of Arabic handwritten character recognition. In recent years, Arabic handwritten characters recognition with different handwriting styles as well, making it important to find and work on a new and advanced solution for handwriting recognition. A deep learning systems needs a huge number of data (images) to be able to make a good decisions.ContentThe data-set is composed of 16,800 characters written by 60 participants, the age range is between 19 to 40 years, and 90% of participants are right-hand. Each participant wrote each character (from ’alef’ to ’yeh’) ten times on two forms as shown in Fig. 7(a) &amp; 7(b). The forms were scanned at the resolution of 300 dpi. Each block is segmented automatically using Matlab 2016a to determining the coordinates for each block. The database is partitioned into two sets: a training set (13,440 characters to 480 images per class) and a test set (3,360 characters to 120 images per class). Writers of training set and test set are exclusive. Ordering of including writers to test set are randomized to make sure that writers of test set are not from a single institution (to ensure variability of the test set).In an experimental section we showed that the results were promising with 94.9% classification accuracy rate on testing images. In future work, we plan to work on improving the performance of handwritten Arabic character recognition.AcknowledgementsAhmed El-Sawy, Mohamed Loey, Hazem EL-Bakry, Arabic Handwritten Characters Recognition using Convolutional Neural Network, WSEAS, 2017<br>Our proposed CNN is giving an average 5.1% misclassification error on testing data.InspirationCreating the proposed database presents more challenges because it deals with many issues such as style of writing, thickness, dots number and position. Some characters have different shapes while written in the same position. For example the teh character has different shapes in isolated position.Benha Universityhttp://bu.edu.eg/staff/mloeyhttps://mloey.github.io/

阿拉伯手写字符数据集 摘要 手写阿拉伯字符识别系统面临诸多挑战,包括人类手写风格的无限多样性以及大规模公开数据集的处理难题。本研究构建了一种可有效应用于阿拉伯手写字符识别的深度学习架构。卷积神经网络(Convolutional Neural Network, CNN)是一类特殊的前馈多层神经网络,采用监督模式进行训练。我们使用包含16800个阿拉伯手写字符的数据集对所提卷积神经网络进行训练与测试。本文采用多种优化方法以提升卷积神经网络的识别性能。常规机器学习方法通常结合使用特征提取器与可训练分类器,而采用卷积神经网络可在多种机器学习分类算法中实现显著的性能提升。本文所提卷积神经网络在测试集上的平均误分类误差仅为5.1%。 研究背景 本研究的动机在于利用从多项相关工作中习得的跨领域知识,以提升阿拉伯手写字符识别的性能。近年来,阿拉伯手写字符的识别还需适配多样化的手写风格,因此探索并研发新型先进的手写识别解决方案显得尤为重要。深度学习系统需要海量的(图像)数据才能做出可靠的决策。 数据集内容 本数据集共包含16800个手写字符,由60名志愿者书写完成,志愿者年龄介于19至40岁之间,其中90%为右利手。每名志愿者需在如图7(a)与7(b)所示的两种书写模板上,对从‘阿列夫(alef)’到‘叶海(yeh)’的所有阿拉伯字符各书写10次。所有书写模板均以300dpi的分辨率进行扫描。研究人员采用Matlab 2016a对扫描图像中的每个字符块进行自动分割,以获取每个字符块的坐标信息。本数据集被划分为训练集与测试集两部分:训练集包含13440个字符,每类字符对应480张图像;测试集包含3360个字符,每类字符对应120张图像。训练集与测试集的书写志愿者完全互斥。测试集志愿者的选取顺序经过随机化处理,以确保测试集志愿者并非来自同一机构,从而保证测试集的多样性。 在实验环节中,本文证明了所提方法的有效性:在测试图像上的分类准确率可达94.9%。在未来工作中,我们计划进一步优化阿拉伯手写字符识别的性能。 致谢 Ahmed El-Sawy、Mohamed Loey、Hazem EL-Bakry,《基于卷积神经网络的阿拉伯手写字符识别》,WSEAS,2017年 本文所提卷积神经网络在测试集上的平均误分类误差仅为5.1%。 设计灵感 构建本数据集面临诸多挑战,需处理手写风格、笔画粗细、点的数量与位置等多种变量。部分字符在相同书写位置下会呈现不同的形态,例如字母teh在孤立书写时存在多种形态。 本哈大学(Benha University) http://bu.edu.eg/staff/mloey https://mloey.github.io/
提供机构:
figshare
创建时间:
2020-05-03
搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作