FAIR Universe - Weak Lensing ML Uncertainty Challenge Public Dataset
收藏DataCite Commons2026-05-07 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20056065
下载链接
链接失效反馈官方服务:
资源简介:
Overview
This dataset contains simulated weak gravitational lensing convergence maps for benchmarking machine learning methods in cosmological inference. It underpins the FAIR Universe Weak Lensing ML Uncertainty Challenge, a community competition run on Codabench targeting cosmological parameter inference with uncertainty quantification (Phase 1, hosted as a NeurIPS 2025 competition:) and out-of-distribution detection (Phase 2). Comprehensive documentation of the dataset and challenge is provided in the white paper.
The dataset reflects realistic weak lensing observations of the Hyper Suprime-Cam (HSC) Year-3 dataset, including the survey footprint, redshift distribution, and galaxy density. It incorporates two of the most important sources of systematic uncertainty in weak lensing analyses — baryonic feedback and photometric redshift uncertainty — modeled as nuisance parameters that are sampled across the dataset and must be marginalized over during inference.
Relationship to the Phase-1 challenge dataset: This release is an upgraded version of the training set used in the Phase-1 challenge. In the Phase-1 release, the same set of random realizations was reused across all cosmologies; in this release, each cosmology uses an independent set of realizations, which substantially increases the effective number of independent samples and removes correlations across cosmologies. We therefore recommend this release for any use beyond reproducing the original Phase-1 leaderboard.
The held-out test set has been generated and consists of 14k convergence maps with corresponding labels, drawn from 82 cosmological models that are not present in the training set. This design enables evaluation of cosmological parameter inference under generalization to unseen cosmologies. The held-out test set will be released after the end of Phase 2 competition.
A small representative sample of the dataset (3 cosmologies × 30 realizations, ≈ 23 MB) is also included for users who want to quickly inspect the data format before downloading the full 6.4 GB file. See the Files section for details.
Files
The dataset consists of five files: two for the full training set, two for a small representative sample, and one shared survey footprint mask used by both.
File
Description
Format
Shape
WIDE12H_bin2_2arcmin_kappa_newrealization.npy
Simulated weak lensing convergence maps (full training set).
NumPy array, float16
(101, 256, 132019)
WIDE12H_bin2_2arcmin_mask.npy
Survey footprint mask (1 = observed pixel, 0 = unobserved pixel). Shared across all maps.
NumPy array, bool
(1424, 176)
label_newrealization.npy
Physical labels for each map (full training set).
NumPy array, float64
(101, 256, 5)
sampled_WIDE12H_bin2_2arcmin_kappa.npy
Sample of the convergence maps (3 cosmologies × 30 realizations).
NumPy array, float16
(3, 30, 132019)
sampled_label.npy
Labels corresponding to the sample convergence maps.
NumPy array, float64
(3, 30, 5)
WIDE12H_bin2_2arcmin_kappa_newrealization.npy
A 3-dimensional array of shape (N_cosmologies, N_realizations, N_observed_pixels) containing the simulated convergence values for each observed pixel. Each entry along the last axis corresponds to one pixel inside the HSC Y3 WIDE12H survey footprint; pixels outside the footprint are not stored. To reconstruct the full 2D map (1424 × 176 pixels at 2 arcmin resolution), use the survey mask (see WIDE12H_bin2_2arcmin_mask.npy below) to place each value at its corresponding location.
Number of cosmologies: 101
Number of independent realizations per cosmology: 256
Number of observed pixels per map: 132019
Map dimensions when reconstructed: 1424 × 176 pixels
Pixel resolution: 2 arcmin
Dtype: float16
The maps are released noiseless. Shape noise from intrinsic galaxy ellipticities can be added during training as on-the-fly Gaussian noise with zero mean and standard deviation σ = σ_ε / sqrt(2 · n_g · A_pixel), where σ_ε ≈ 0.4 is the typical galaxy intrinsic ellipticity, A_pixel = 4 arcmin², and n_g = 30 arcmin⁻² is the assumed source galaxy number density (representative of upcoming Stage-IV surveys such as Rubin/LSST and Euclid).
WIDE12H_bin2_2arcmin_mask.npy
A 2D boolean array of shape (1424, 176) representing the HSC Y3 WIDE12H survey footprint. Pixels with value 1 lie within the observed region; pixels with value 0 are outside the survey and should be excluded from analysis. The mask is identical across all maps in the dataset.
label_newrealization.npy
A 2D array of shape (N_cosmologies, N_realizations, 5) containing the ground-truth labels for each map, in the order:
Index
Symbol
Description
Range
0
Ω_m
Total matter density of the universe (parameter of interest)
≈ [0.09, 0.62]
1
S_8
Clustering amplitude σ_8 (Ω_m / 0.3)^0.5 (parameter of interest)
≈ [0.68, 0.96]
2
T_AGN
AGN feedback strength (nuisance parameter, baryonic feedback)
[7.2, 8.5]
3
f_0
Star formation parameter (nuisance parameter, baryonic feedback)
[0, 0.0265]
4
Δ_z
Photometric redshift shift (nuisance parameter)
drawn from N(0, 0.022²)
The first two parameters (Ω_m, S_8) are the parameters of interest — the targets of cosmological inference. The remaining three (T_AGN, f_0, Δ_z) are nuisance parameters that participants must marginalize over during inference.
sampled_WIDE12H_bin2_2arcmin_kappa.npy and sampled_label.npy
These two files together form a small representative sample of the dataset, intended to help users quickly familiarize themselves with the data format and contents without downloading the full 6.4 GB convergence-maps file.
The sample contains 3 cosmological models drawn from the full set of 101, with 30 independent realizations per cosmology (90 maps in total). Same axis conventions, pixel layout, and label structure as the full files described above; the survey mask (WIDE12H_bin2_2arcmin_mask.npy) applies to the sample as well.
sampled_WIDE12H_bin2_2arcmin_kappa.npy — convergence maps, shape (3, 30, 132019), float16, ≈ 23 MB.
sampled_label.npy — corresponding labels, shape (3, 30, 5), float64, ≈ 4 KB.
Examples and tutorials
For complete examples of loading the dataset, adding shape noise, simple visualization, and training baseline models, please see the example Jupyter notebooks in the challenge GitHub repository:
The repository contains six notebooks covering three baseline methods for each of the two challenge phases: Phase 1 (parameter inference with power spectrum, CNN learned statistics+MCMC, CNN direct prediction) and Phase 2 (out-of-distribution detection with power spectrum, CNN learned statistics, autoencoder).
Related resources
Dataset white paper: B. Dai et al. (2026), arXiv:2604.14451. https://arxiv.org/abs/2604.14451
Phase-1 competition (Codabench): https://www.codabench.org/competitions/8934/
Phase-2 competition (Codabench): https://www.codabench.org/competitions/10902/
Challenge GitHub repository (starting kits, baseline code, submission tools): https://github.com/FAIR-Universe/Cosmology_Challenge
Challenge website: https://fair-universe.lbl.gov/
Citation
If you use this dataset, please cite the following:
The white paper: B. Dai et al., FAIR Universe Weak Lensing ML Uncertainty Challenge: Handling Uncertainties and Distribution Shifts for Precision Cosmology, arXiv:2604.14451 [astro-ph.CO]. https://arxiv.org/abs/2604.14451
This Zenodo record (DOI: 10.5281/zenodo.20056065).
Contact
For questions about the dataset, contact: fair-universe@lbl.gov
提供机构:
Zenodo
创建时间:
2026-05-07



