Oracle-MNIST: a Realistic Image Dataset for Benchmarking Machine Learning Algorithms

Name: Oracle-MNIST: a Realistic Image Dataset for Benchmarking Machine Learning Algorithms
Creator: figshare
Published: 2025-04-01 08:02:21
License: 暂无描述

DataCite Commons2025-04-01 更新2024-08-18 收录

下载链接：

https://figshare.com/articles/dataset/Oracle-MNIST_a_Realistic_Image_Dataset_for_Benchmarking_Machine_Learning_Algorithms/23935689/1

下载链接

链接失效反馈

官方服务：

资源简介：

Oracle-MNIST dataset comprises of 28×28 grayscale images of 30,222 ancient characters from 10 categories, for benchmarking pattern classification, with particular challenges on image noise and distortion. The training set totally consists of 27,222 images, and the test set contains 300 images per class.1. Easy-of-use. Oracle-MNIST shares the same data format with the original MNIST dataset, allowing for direct compatibility with all existing classifiers and systems.2. Real-world challenge. Oracle-MNIST constitutes a more challenging classification task than MNIST. The images of oracle characters suffer from 1) extremely serious and unique noises caused by three- thousand years of burial and aging and 2) dramatically variant writing styles by ancient Chinese, which all make them realistic for machine learning research.

Oracle-MNIST 数据集（Oracle-MNIST dataset）包含来自10个类别的30222幅28×28灰度古代汉字图像，用于模式分类基准测试，其核心挑战在于图像噪声与失真问题。训练集共计27222幅图像，测试集每类别包含300幅图像。 1. 易用性。Oracle-MNIST 数据集与原始MNIST数据集采用完全一致的数据格式，可与所有现有分类器及系统直接兼容。 2. 真实场景挑战。相较于MNIST数据集，Oracle-MNIST 数据集所构建的分类任务更具难度。甲骨文图像面临两大挑战：一是历经三千年埋藏与老化所产生的极为严重且独具特色的噪声；二是古代中国人迥异多样的书写风格，这些特性使得该数据集更适用于机器学习研究场景。

提供机构：

figshare

创建时间：

2023-08-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集