手写数字数据集的笔式识别

Name: 手写数字数据集的笔式识别
Creator: 帕依提提
License: 暂无描述

帕依提提2024-03-04 收录

下载链接：

https://www.payititi.com/opendatasets/show-26201.html

下载链接

链接失效反馈

官方服务：

资源简介：

Data Set Information: 我们从44位作者那里收集了250个样本，创建了一个数字数据库。由30位作者编写的样本用于培训、交叉验证和作者相关测试，其他14位作者编写的数字用于作者独立测试。该数据库也可以UNIPEN格式提供。我们使用WACOM PL-100V压敏平板电脑，内置LCD显示屏和无绳手写笔。输入区和显示区位于同一位置。它连接到基于Intel 486的PC的串行端口，允许我们采集手写样本。平板电脑以100毫秒的固定时间间隔（采样率）发送$x$和$y$平板电脑坐标和笔的压力水平值。这些书写者被要求在分辨率为500×500像素的盒子中随机书写250位数字。仅在第一次进入屏幕期间监控受试者。每个屏幕包含五个框，上面显示要写入的数字。受试者被要求只在这些盒子里写字。如果他们犯了错误或对自己的写作不满意，他们会被指示使用屏幕上的按钮清除方框中的内容。前十位数字被忽略，因为大多数作者不熟悉这种类型的输入设备，但受试者没有意识到这一点。在我们的研究中，我们只使用（$x，y$）坐标信息。将忽略触针压力级别值。首先，我们应用规范化使我们的表示对平移和尺度失真保持不变。我们从平板电脑捕获的原始数据由0到500之间的整数值组成（平板电脑输入框分辨率）。新坐标是指具有最大范围的坐标在0和100之间变化。通常$x$会保持在这个范围内，因为大多数角色都比宽高。为了训练和测试分类器，我们需要将数字表示为等长特征向量。导致良好结果的一种常用技术是对（x_t，y_t）点进行重采样。这里可以使用时间重采样（时间上规则间隔的点）或空间重采样（弧长上规则间隔的点）。原始点数据在时间上已经有规则的间隔，但它们之间的距离是可变的。先前的研究表明，空间重采样可以获得轨迹上固定数量的规则间隔点，这会产生更好的性能，因为它可以在点之间提供更好的对齐。我们的重采样算法使用点对之间的简单线性插值。重采样的数字表示为一系列的T点（x_T，y_T）{T=1}^T，按弧长规则间隔，而不是按时间规则间隔的输入序列。因此，输入向量的大小是2*T，是重采样点数的两倍。在我们的实验中，我们考虑了空间重采样到T=8,12,16点，并发现T=8在准确性和复杂性之间进行了最佳权衡。 Attribute Information: All input attributes are integers in the range 0..100. The last attribute is the class code 0..9 Relevant Papers: F. Alimoglu (1996) Combining Multiple Classifiers for Pen-based Handwritten Digit Recognition, MSc Thesis, Institute of Graduate Studies in Science and Engineering, Bogazici University. [Web link] [Web link] F. Alimoglu, E. Alpaydin, "Methods of Combining Multiple Classifiers based on Different Representations for Pen-based Handwriting Recognition," Proceedings of the Fifth Turkish Artificial Intelligence and Artificial Neural Networks Symposium (TAINN 96), June 1996, Istanbul, Turkey. [Web link] [Web link] Papers That Cite This Data Set1: Ken Tang and Ponnuthurai N. Suganthan and Xi Yao and A. Kai Qin. Linear dimensionalityreduction using relevance weighted LDA. School of Electrical and Electronic Engineering Nanyang Technological University. 2005. [View Context]. Mikhail Bilenko and Sugato Basu and Raymond J. Mooney. Integrating constraints and metric learning in semi-supervised clustering. ICML. 2004. [View Context]. Fabian Hoti and Lasse Holmstr?m. A semiparametric density estimation approach to pattern classification. Pattern Recognition, 37. 2004. [View Context]. Thomas Serafini and G. Zanghirati and Del Zanna and T. Serafini and Gaetano Zanghirati and Luca Zanni. DIPARTIMENTO DI MATEMATICA. Gradient Projection Methods for. 2003. [View Context]. Manoranjan Dash and Huan Liu and Peter Scheuermann and Kian-Lee Tan. Fast hierarchical clustering and its validation. Data Knowl. Eng, 44. 2003. [View Context]. Dennis DeCoste. Anytime Query-Tuned Kernel Machines via Cholesky Factorization. SDM. 2003. [View Context]. Greg Hamerly and Charles Elkan. Learning the k in k-means. NIPS. 2003. [View Context]. Marina Meila and Michael I. Jordan. Learning with Mixtures of Trees. Journal of Machine Learning Research, 1. 2000. [View Context]. Ethem Alpaydin. Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms. Neural Computation, 11. 1999. [View Context]. Georg Thimm and Emile Fiesler. IDIAP Technical report High Order and Multilayer Perceptron Initialization. IEEE Transactions. 1994. [View Context]. Perry Moerland. Mixtures of latent variable models for density estimation and classification. E S E A R C H R E P R O R T I D I A P D a l l e M o l l e E. Alpaydin, Fevzi. Alimoglu Department of Computer Engineering Bogazici University, 80815 Istanbul Turkey alpaydin '@' boun.edu.tr

提供机构：

帕依提提