40万个手写姓名图像数据集

Name: 40万个手写姓名图像数据集
Creator: 帕依提提
License: 暂无描述

帕依提提2024-03-04 收录

下载链接：

https://www.payititi.com/opendatasets/show-26393.html

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包括通过慈善项目收集的超过40万个手写姓名，以支持世界各地的弱势儿童。字符识别利用图像处理技术将扫描文档上的字符转换为数字形式。它通常在机器打印的字体中表现良好。然而，由于个人书写风格的巨大差异，对于机器识别手写字符仍然构成困难的挑战。总共有206,799个姓氏和207,024个姓氏。数据分别分为训练集（331,059），测试集（41,382）和验证集（41,382）。这里的输入数据是成千上万个手写姓名的图像。在“数据”中，您会发现转录的图像分为测试集，训练集和验证集。 Image Lable遵循以下命名格式，使您可以使用自己的数据扩展数据集。

This dataset contains over 400,000 handwritten names collected through charity projects to support vulnerable children worldwide. Character recognition converts characters on scanned documents into digital form using image processing technologies, and it generally performs well with machine-printed fonts. However, due to the significant differences in individual handwriting styles, machine recognition of handwritten characters still poses a challenging task. In total, there are 206,799 surnames and 207,024 given names. The dataset is divided into three subsets: training set (331,059 samples), test set (41,382 samples), and validation set (41,382 samples). The input data here consists of images of tens of thousands of handwritten names. In the "data" folder, you will find the transcribed images categorized into the training, test, and validation sets. Image labels follow the naming format outlined below, allowing you to expand the dataset with your own data.

提供机构：

帕依提提

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集包含超过40万个手写姓名图像，通过慈善项目收集，旨在支持全球弱势儿童。数据分为训练集、测试集和验证集，总计约41.4万个样本，涵盖20多万个姓氏和名字，适用于手写字符识别和自然语言处理任务，但手写风格的多样性增加了机器识别的挑战。

以上内容由遇见数据集搜集并总结生成