mweiss/fashion_mnist_ambiguous

Name: mweiss/fashion_mnist_ambiguous
Creator: mweiss
Published: 2023-03-16 12:43:23
License: 暂无描述

Hugging Face2023-03-16 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/mweiss/fashion_mnist_ambiguous

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - image-classification language: - en pretty_name: mnist_ambigous size_categories: - 10K<n<100K source_datasets: - extended|mnist annotations_creators: - machine-generated --- # Fashion-Mnist-Ambiguous This dataset contains fashion-mnist-like images, but with an unclear ground truth. For each image, there are two classes that could be considered true. Robust and uncertainty-aware DNNs should thus detect and flag these issues. ### Features Same as fashion-mnist, the supervised dataset has an `image` (28x28 int array) and a `label` (int). Additionally, the following features are exposed for your convenience: - `text_label` (str): A textual representation of the probabilistic label, e.g. `p(Pullover)=0.54, p(Shirt)=0.46` - `p_label` (list of floats): Ground-Truth probabilities for each class (two nonzero values for our ambiguous images) - `is_ambiguous` (bool): Flag indicating if this is one of our ambiguous images (see 'splits' below) ### Splits We provide four splits: - `test`: 10'000 ambiguous images - `train`: 10'000 ambiguous images - adding ambiguous images to the training set makes sure test-time ambiguous images are in-distribution. - `test_mixed`: 20'000 images, consisting of the (shuffled) concatenation of our ambiguous `test` set and the nominal *original* fashion mnist test set - `train_mixed`: 70'000 images, consisting of the (shuffled) concatenation of our ambiguous `training` and the nominal training set. Note that the ambiguous train images are highly ambiguous (i.e., the two classes have very similar ground truth likelihoods), the training set images allow for more unbalanced ambiguity. This is to make the training set more closely connected to the nominal data, while still keeping the test set clearly ambiguous. For research targeting explicitly aleatoric uncertainty, we recommend training the model using `train_mixed`. Otherwise, our `test` set will lead to both epistemic and aleatoric uncertainty. In related literature, such 'mixed' splits are sometimes denoted as *dirty* splits. ### Assessment and Validity For a brief discussion of the strength and weaknesses of this dataset we refer to our paper. Please note that our images are not typically realistic - i.e., while they represent multiple classes and thus have an ambiguous ground truth, they do not resemble real-world photographs. ### Paper Pre-print here: [https://arxiv.org/abs/2207.10495](https://arxiv.org/abs/2207.10495) Citation: ``` @misc{https://doi.org/10.48550/arxiv.2207.10495, doi = {10.48550/ARXIV.2207.10495}, url = {https://arxiv.org/abs/2207.10495}, author = {Weiss, Michael and Gómez, André García and Tonella, Paolo}, title = {A Forgotten Danger in DNN Supervision Testing: Generating and Detecting True Ambiguity}, publisher = {arXiv}, year = {2022} } ``` ### Related Datasets - Ambiguous Mnist Dataset: [https://huggingface.co/datasets/mweiss/mnist_ambiguous](https://huggingface.co/datasets/mweiss/mnist_ambiguous) - Corrupted Fashion-Mnist Dataset: [https://huggingface.co/datasets/mweiss/fashion_mnist_corrupted](https://huggingface.co/datasets/mweiss/fashion_mnist_corrupted)

提供机构：

mweiss

原始信息汇总

数据集概述

基本信息

名称: Fashion-Mnist-Ambiguous
许可证: MIT
任务类别: 图像分类
语言: 英语
大小类别: 10K<n<100K
源数据集: 扩展自mnist
标注创建者: 机器生成

数据集描述

内容: 包含类似Fashion-Mnist的图像，但具有不明确的真值。每张图像有两个可能的正确类别。
目的: 用于测试和开发能够检测和标记这些问题的鲁棒和不确定性感知深度神经网络（DNNs）。

特征

图像: 28x28的整数数组
标签: 整数
额外特征:
- text_label (字符串): 概率标签的文本表示，例如p(Pullover)=0.54, p(Shirt)=0.46
- p_label (浮点数列表): 每个类别的真值概率（对于模糊图像有两个非零值）
- is_ambiguous (布尔值): 指示图像是否为模糊图像的标志