tsilva/mnist-gaussian-noisy
收藏Hugging Face2026-03-11 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/tsilva/mnist-gaussian-noisy
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: image
dtype: image
- name: noise
dtype:
array2_d:
shape:
- 28
- 28
dtype: float32
- name: raw_image
dtype: image
- name: label
dtype:
class_label:
names:
'0': '0'
'1': '1'
'2': '2'
'3': '3'
'4': '4'
'5': '5'
'6': '6'
'7': '7'
'8': '8'
'9': '9'
- name: source_index
dtype: int32
- name: replica_index
dtype: int16
- name: noise_variance
dtype: float32
splits:
- name: train
num_bytes: 1268081241
num_examples: 300000
- name: test
num_bytes: 188565919
num_examples: 44600
download_size: 1460541650
dataset_size: 1456647160
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
---
# tsilva/mnist-gaussian-noisy
## Dataset Summary
This dataset expands MNIST by creating multiple Gaussian-noisy variants of each original example. Each row is structured for direct supervised training: the input is a noisy image and the target is the original sampled Gaussian noise map, with the clean image kept as a reference column.
Noise is sampled from a zero-mean normal distribution on normalized pixel values in `[0, 1]`, added to the clean image, clipped back to `[0, 1]`, and converted to 8-bit grayscale. The `noise` column stores the original sampled Gaussian draw before clipping.
## Columns
- `image`: the noisy 28x28 grayscale input image used as the model source
- `noise`: the 28x28 float Gaussian noise sample in normalized pixel space
- `raw_image`: the clean 28x28 grayscale reference image
- `label`: the original digit class from `0` to `9`
- `source_index`: the original example index inside the source MNIST split
- `replica_index`: which noisy replica this row corresponds to for the clean source image
- `noise_variance`: the Gaussian variance used to sample the stored noise map
## Splits
- `train`: 300,000 image pairs
- `test`: 44,600 image pairs, balanced to `4,460` pairs per class
## Noise Configuration
- Source dataset: MNIST
- Noisy counterparts per source example: `5`
- Variances: `0.0100, 0.0325, 0.0550, 0.0775, 0.1000`
- Random seed: `42`
- Test balancing: exact class balance via downsampling the MNIST test split to the minimum class count
## Intended Use
This dataset is intended for experiments where each training row should already contain a noisy source image and the original noise sample used to corrupt it. It is suited for noise prediction and generative or iterative denoising setups that operate directly on sampled noise fields.
## Load Example
```python
from datasets import load_dataset
ds = load_dataset("tsilva/mnist-gaussian-noisy")
sample = ds["train"][0]
print(sample["image"])
print(sample["noise"][0][0])
print(sample["raw_image"])
print(sample["noise_variance"])
```
提供机构:
tsilva



