German Character Recognition Dataset

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://zenodo.org/records/8364967

下载链接

链接失效反馈

官方服务：

资源简介：

The dataset contains 282,472 grayscale images, each measuring 40 x 40 pixels, depicting a diverse range of 82 distinct German characters, digits and mathematical symbols. In contrast to the MNIST dataset, where image alignment varies, all the images in this dataset are perfectly aligned. They are centered within a 40 x 40 bounding box, ensuring they touch either the left and right sides or the top and bottom borders. This alignment significantly simplifies the training task, leading to excellent performance metrics. The training and testing data is stored in two separate CSV files. In each file, the first column represents the Unicode character, while the subsequent 1600 values correspond to the grayscale values of the flattened image. If you find any aspect unclear, please refer to our attached code, which offers a comprehensive logic for training a CNN in PyTorch. You can easily select the specific classes on which you intend to train. Notably, when exclusively training on the digits from 0 to 9, we achieved an impressive accuracy and Matthews Correlation Coefficient (MCC) of roughly 99% on the test data.

创建时间：

2023-09-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集