five

mweiss/fashion_mnist_ambiguous

收藏
Hugging Face2023-03-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/mweiss/fashion_mnist_ambiguous
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - image-classification language: - en pretty_name: mnist_ambigous size_categories: - 10K<n<100K source_datasets: - extended|mnist annotations_creators: - machine-generated --- # Fashion-Mnist-Ambiguous This dataset contains fashion-mnist-like images, but with an unclear ground truth. For each image, there are two classes that could be considered true. Robust and uncertainty-aware DNNs should thus detect and flag these issues. ### Features Same as fashion-mnist, the supervised dataset has an `image` (28x28 int array) and a `label` (int). Additionally, the following features are exposed for your convenience: - `text_label` (str): A textual representation of the probabilistic label, e.g. `p(Pullover)=0.54, p(Shirt)=0.46` - `p_label` (list of floats): Ground-Truth probabilities for each class (two nonzero values for our ambiguous images) - `is_ambiguous` (bool): Flag indicating if this is one of our ambiguous images (see 'splits' below) ### Splits We provide four splits: - `test`: 10'000 ambiguous images - `train`: 10'000 ambiguous images - adding ambiguous images to the training set makes sure test-time ambiguous images are in-distribution. - `test_mixed`: 20'000 images, consisting of the (shuffled) concatenation of our ambiguous `test` set and the nominal *original* fashion mnist test set - `train_mixed`: 70'000 images, consisting of the (shuffled) concatenation of our ambiguous `training` and the nominal training set. Note that the ambiguous train images are highly ambiguous (i.e., the two classes have very similar ground truth likelihoods), the training set images allow for more unbalanced ambiguity. This is to make the training set more closely connected to the nominal data, while still keeping the test set clearly ambiguous. For research targeting explicitly aleatoric uncertainty, we recommend training the model using `train_mixed`. Otherwise, our `test` set will lead to both epistemic and aleatoric uncertainty. In related literature, such 'mixed' splits are sometimes denoted as *dirty* splits. ### Assessment and Validity For a brief discussion of the strength and weaknesses of this dataset we refer to our paper. Please note that our images are not typically realistic - i.e., while they represent multiple classes and thus have an ambiguous ground truth, they do not resemble real-world photographs. ### Paper Pre-print here: [https://arxiv.org/abs/2207.10495](https://arxiv.org/abs/2207.10495) Citation: ``` @misc{https://doi.org/10.48550/arxiv.2207.10495, doi = {10.48550/ARXIV.2207.10495}, url = {https://arxiv.org/abs/2207.10495}, author = {Weiss, Michael and Gómez, André García and Tonella, Paolo}, title = {A Forgotten Danger in DNN Supervision Testing: Generating and Detecting True Ambiguity}, publisher = {arXiv}, year = {2022} } ``` ### Related Datasets - Ambiguous Mnist Dataset: [https://huggingface.co/datasets/mweiss/mnist_ambiguous](https://huggingface.co/datasets/mweiss/mnist_ambiguous) - Corrupted Fashion-Mnist Dataset: [https://huggingface.co/datasets/mweiss/fashion_mnist_corrupted](https://huggingface.co/datasets/mweiss/fashion_mnist_corrupted)
提供机构:
mweiss
原始信息汇总

数据集概述

基本信息

  • 名称: Fashion-Mnist-Ambiguous
  • 许可证: MIT
  • 任务类别: 图像分类
  • 语言: 英语
  • 大小类别: 10K<n<100K
  • 源数据集: 扩展自mnist
  • 标注创建者: 机器生成

数据集描述

  • 内容: 包含类似Fashion-Mnist的图像,但具有不明确的真值。每张图像有两个可能的正确类别。
  • 目的: 用于测试和开发能够检测和标记这些问题的鲁棒和不确定性感知深度神经网络(DNNs)。

特征

  • 图像: 28x28的整数数组
  • 标签: 整数
  • 额外特征:
    • text_label (字符串): 概率标签的文本表示,例如p(Pullover)=0.54, p(Shirt)=0.46
    • p_label (浮点数列表): 每个类别的真值概率(对于模糊图像有两个非零值)
    • is_ambiguous (布尔值): 指示图像是否为模糊图像的标志

分割

  • 训练集: 10,000张模糊图像
  • 测试集: 10,000张模糊图像
  • 混合测试集: 20,000张图像,包含模糊测试集和原始Fashion Mnist测试集的随机组合
  • 混合训练集: 70,000张图像,包含模糊训练集和原始训练集的随机组合

使用建议

  • 对于研究随机不确定性,推荐使用train_mixed进行模型训练。
  • 使用test集将导致认识论和随机不确定性。

评估和有效性

  • 图像不代表真实世界的照片,尽管它们代表多个类别并具有模糊的真值。

相关文献

  • 相关论文可在arXiv获取。

相关数据集

  • Ambiguous Mnist Dataset: 链接
  • Corrupted Fashion-Mnist Dataset: 链接
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作