Synthetic Dyslexia Handwriting Dataset (YOLO-Format)

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/14852658

下载链接

链接失效反馈

官方服务：

资源简介：

DescriptionThis synthetic dataset has been generated to facilitate object detection (in YOLO format) for research on dyslexia-related handwriting patterns. It builds upon an original corpus of uppercase and lowercase letters obtained from multiple sources: the NIST Special Database 19 111, the Kaggle dataset “A-Z Handwritten Alphabets in .csv format” 222, as well as handwriting samples from dyslexic primary school children of Seberang Jaya, Penang (Malaysia). In the original dataset, uppercase letters originated from NIST Special Database 19, while lowercase letters came from the Kaggle dataset curated by S. Patel. Additional images (categorized as Normal, Reversal, and Corrected) were collected and labeled based on handwriting samples of dyslexic and non-dyslexic students, resulting in: 78,275 images labeled as Normal 52,196 images labeled as Reversal 8,029 images labeled as Corrected Building upon this foundation, the Synthetic Dyslexia Handwriting Dataset presented here was programmatically generated to produce labeled examples suitable for training and validating object detection models. Each synthetic image arranges multiple letters of various classes (Normal, Reversal, Corrected) in a “text line” style on a black background, providing YOLO-compatible .txt annotations that specify bounding boxes for each letter. Key Points of the Synthetic Generation Process Letter-Level Source DataIndividual characters were sampled from the original image sets. Randomized LayoutLetters are randomly assembled into words and lines, ensuring a wide variety of visual arrangements. Bounding Box LabelsEach character is assigned a bounding box with (x, y, width, height) in YOLO format. Class AnnotationsClasses include 0 = Normal, 1 = Reversal, and 2 = Corrected. Preservation of Visual CharacteristicsLetters retain their key dyslexia-relevant features (e.g., reversals). Historical References & Credits If you are using this synthetic dataset or the original Dyslexia Handwriting Dataset, please cite the following papers: M. S. A. B. Rosli, I. S. Isa, S. A. Ramlan, S. N. Sulaiman and M. I. F. Maruzuki, "Development of CNN Transfer Learning for Dyslexia Handwriting Recognition," 2021 11th IEEE International Conference on Control System, Computing and Engineering (ICCSCE), 2021, pp. 194–199, doi: 10.1109/ICCSCE52189.2021.9530971. N. S. L. Seman, I. S. Isa, S. A. Ramlan, W. Li-Chih and M. I. F. Maruzuki, "Notice of Removal: Classification of Handwriting Impairment Using CNN for Potential Dyslexia Symptom," 2021 11th IEEE International Conference on Control System, Computing and Engineering (ICCSCE), 2021, pp. 188–193, doi: 10.1109/ICCSCE52189.2021.9530989. Isa, Iza Sazanita. CNN Comparisons Models On Dyslexia Handwriting Classification / Iza Sazanita Isa … [et Al.]. Universiti Teknologi MARA Cawangan Pulau Pinang, 2021. Isa, I. S., Rahimi, W. N. S., Ramlan, S. A., & Sulaiman, S. N. (2019). Automated detection of dyslexia symptom based on handwriting image for primary school children. Procedia Computer Science, 163, 440–449. References to Original Data Sources 111 P. J. Grother, “NIST Special Database 19,” NIST, 2016. [Online]. Available:https://www.nist.gov/srd/nist-special-database-19 222 S. Patel, “A-Z Handwritten Alphabets in .csv format,” Kaggle, 2017. [Online]. Available:https://www.kaggle.com/sachinpatel21/az-handwritten-alphabets-in-csv-format Usage & Citation Researchers and practitioners are encouraged to integrate this synthetic dataset into their computer vision pipelines for tasks such as dyslexia pattern analysis, character recognition, and educational technology development. Please cite the original authors and publications if you utilize this synthetic dataset in your work. Password Note (Original Data) The original RAR file was password-protected with the password: WanAsy321. This synthetic dataset, however, is provided openly for streamlined usage.

创建时间：

2025-02-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集