MNIST dataset for Outliers Detection - [ MNIST4OD ]
收藏DataCite Commons2024-05-17 更新2024-07-27 收录
下载链接:
https://figshare.com/articles/dataset/MNIST_dataset_for_Outliers_Detection_-_MNIST4OD_/9954986/1
下载链接
链接失效反馈官方服务:
资源简介:
Here we present a dataset, MNIST4OD, of large size (number of dimensions and number of instances) suitable for Outliers Detection task.The dataset is based on the famous MNIST dataset (http://yann.lecun.com/exdb/mnist/).<br><br>We build MNIST4OD in the following way:To distinguish between outliers and inliers, we choose the images belonging to a digit as inliers (e.g. digit 1) and we sample with uniform probability on 1% of the remaining images as outliers. We repeat this dataset generation process for all digits. <br>For implementation simplicity we then flatten the images (28 X 28) into vectors.<br><br>Each file MNIST_x.csv.gz contains the corresponding dataset where the inlier class is equal to x.The data contains one instance (vector) in each line where the last column represents the outlier label (yes/no) of the data point.<br>See the following numbers for a complete list of the statistics of each datasets (Name | Instances | Dimensions | Number of Outliers in % ):MNIST_0 | 7534 | 784 | 8 <br>MNIST_1 | 8499 | 784 | 7MNIST_2 | 7621 | 784 | 8MNIST_3 | 7770 | 784 | 8MNIST_4 | 7456 | 784 | 8MNIST_5 | 6950 | 784 | 9MNIST_6 | 7508 | 784 | 8MNIST_7 | 7921 | 784 | 8MNIST_8 | 7457 | 784 | 8MNIST_9 | 7589 | 784 | 8<br><br>
本文提出一款维度与样本量均较为可观的异常检测(Outliers Detection)专用数据集MNIST4OD。该数据集基于知名的MNIST数据集(http://yann.lecun.com/exdb/mnist/)构建。
我们按如下流程生成MNIST4OD:为区分异常样本与正常内点,我们选取对应某一数字的图像作为内点类别(例如数字1),并从剩余图像中以均匀概率抽取1%作为异常样本。我们针对所有数字类别重复上述数据集生成流程。为简化实现流程,我们将28×28的原始图像展平为一维向量。
每个MNIST_x.csv.gz文件对应以数字x作为内点类别的数据集。数据文件的每行代表一个样本向量,最后一列为该数据点的异常标签(是/否)。
以下为各数据集的完整统计信息(数据集名称 | 样本量 | 维度 | 异常样本占比(%)):
MNIST_0 | 7534 | 784 | 8
MNIST_1 | 8499 | 784 | 7
MNIST_2 | 7621 | 784 | 8
MNIST_3 | 7770 | 784 | 8
MNIST_4 | 7456 | 784 | 8
MNIST_5 | 6950 | 784 | 9
MNIST_6 | 7508 | 784 | 8
MNIST_7 | 7921 | 784 | 8
MNIST_8 | 7457 | 784 | 8
MNIST_9 | 7589 | 784 | 8
提供机构:
figshare
创建时间:
2019-10-08



