five

FAIR Universe - Weak Lensing ML Uncertainty Challenge Public Dataset

收藏
DataCite Commons2026-05-07 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20056065
下载链接
链接失效反馈
官方服务:
资源简介:
Overview This dataset contains simulated weak gravitational lensing convergence maps for benchmarking machine learning methods in cosmological inference. It underpins the FAIR Universe Weak Lensing ML Uncertainty Challenge, a community competition run on Codabench targeting cosmological parameter inference with uncertainty quantification (Phase 1, hosted as a NeurIPS 2025 competition:) and out-of-distribution detection (Phase 2). Comprehensive documentation of the dataset and challenge is provided in the white paper. The dataset reflects realistic weak lensing observations of the Hyper Suprime-Cam (HSC) Year-3 dataset, including the survey footprint, redshift distribution, and galaxy density. It incorporates two of the most important sources of systematic uncertainty in weak lensing analyses — baryonic feedback and photometric redshift uncertainty — modeled as nuisance parameters that are sampled across the dataset and must be marginalized over during inference. Relationship to the Phase-1 challenge dataset: This release is an upgraded version of the training set used in the Phase-1 challenge. In the Phase-1 release, the same set of random realizations was reused across all cosmologies; in this release, each cosmology uses an independent set of realizations, which substantially increases the effective number of independent samples and removes correlations across cosmologies. We therefore recommend this release for any use beyond reproducing the original Phase-1 leaderboard. The held-out test set has been generated and consists of 14k convergence maps with corresponding labels, drawn from 82 cosmological models that are not present in the training set. This design enables evaluation of cosmological parameter inference under generalization to unseen cosmologies. The held-out test set will be released after the end of Phase 2 competition. A small representative sample of the dataset (3 cosmologies × 30 realizations, ≈ 23 MB) is also included for users who want to quickly inspect the data format before downloading the full 6.4 GB file. See the Files section for details.   Files The dataset consists of five files: two for the full training set, two for a small representative sample, and one shared survey footprint mask used by both. File Description Format Shape WIDE12H_bin2_2arcmin_kappa_newrealization.npy Simulated weak lensing convergence maps (full training set). NumPy array, float16 (101, 256, 132019) WIDE12H_bin2_2arcmin_mask.npy Survey footprint mask (1 = observed pixel, 0 = unobserved pixel). Shared across all maps. NumPy array, bool (1424, 176) label_newrealization.npy Physical labels for each map (full training set). NumPy array, float64 (101, 256, 5) sampled_WIDE12H_bin2_2arcmin_kappa.npy Sample of the convergence maps (3 cosmologies × 30 realizations). NumPy array, float16 (3, 30, 132019) sampled_label.npy Labels corresponding to the sample convergence maps. NumPy array, float64 (3, 30, 5)   WIDE12H_bin2_2arcmin_kappa_newrealization.npy A 3-dimensional array of shape (N_cosmologies, N_realizations, N_observed_pixels) containing the simulated convergence values for each observed pixel. Each entry along the last axis corresponds to one pixel inside the HSC Y3 WIDE12H survey footprint; pixels outside the footprint are not stored. To reconstruct the full 2D map (1424 × 176 pixels at 2 arcmin resolution), use the survey mask (see WIDE12H_bin2_2arcmin_mask.npy below) to place each value at its corresponding location. Number of cosmologies: 101 Number of independent realizations per cosmology: 256 Number of observed pixels per map: 132019 Map dimensions when reconstructed: 1424 × 176 pixels Pixel resolution: 2 arcmin Dtype: float16 The maps are released noiseless. Shape noise from intrinsic galaxy ellipticities can be added during training as on-the-fly Gaussian noise with zero mean and standard deviation σ = σ_ε / sqrt(2 · n_g · A_pixel), where σ_ε ≈ 0.4 is the typical galaxy intrinsic ellipticity, A_pixel = 4 arcmin², and n_g = 30 arcmin⁻² is the assumed source galaxy number density (representative of upcoming Stage-IV surveys such as Rubin/LSST and Euclid). WIDE12H_bin2_2arcmin_mask.npy A 2D boolean array of shape (1424, 176) representing the HSC Y3 WIDE12H survey footprint. Pixels with value 1 lie within the observed region; pixels with value 0 are outside the survey and should be excluded from analysis. The mask is identical across all maps in the dataset. label_newrealization.npy A 2D array of shape (N_cosmologies, N_realizations, 5) containing the ground-truth labels for each map, in the order: Index Symbol Description Range 0 Ω_m Total matter density of the universe (parameter of interest) ≈ [0.09, 0.62] 1 S_8 Clustering amplitude σ_8 (Ω_m / 0.3)^0.5 (parameter of interest) ≈ [0.68, 0.96] 2 T_AGN AGN feedback strength (nuisance parameter, baryonic feedback) [7.2, 8.5] 3 f_0 Star formation parameter (nuisance parameter, baryonic feedback) [0, 0.0265] 4 Δ_z Photometric redshift shift (nuisance parameter) drawn from N(0, 0.022²) The first two parameters (Ω_m, S_8) are the parameters of interest — the targets of cosmological inference. The remaining three (T_AGN, f_0, Δ_z) are nuisance parameters that participants must marginalize over during inference. sampled_WIDE12H_bin2_2arcmin_kappa.npy and sampled_label.npy These two files together form a small representative sample of the dataset, intended to help users quickly familiarize themselves with the data format and contents without downloading the full 6.4 GB convergence-maps file. The sample contains 3 cosmological models drawn from the full set of 101, with 30 independent realizations per cosmology (90 maps in total). Same axis conventions, pixel layout, and label structure as the full files described above; the survey mask (WIDE12H_bin2_2arcmin_mask.npy) applies to the sample as well. sampled_WIDE12H_bin2_2arcmin_kappa.npy — convergence maps, shape (3, 30, 132019), float16, ≈ 23 MB. sampled_label.npy — corresponding labels, shape (3, 30, 5), float64, ≈ 4 KB.   Examples and tutorials For complete examples of loading the dataset, adding shape noise, simple visualization, and training baseline models, please see the example Jupyter notebooks in the challenge GitHub repository: The repository contains six notebooks covering three baseline methods for each of the two challenge phases: Phase 1 (parameter inference with power spectrum, CNN learned statistics+MCMC, CNN direct prediction) and Phase 2 (out-of-distribution detection with power spectrum, CNN learned statistics, autoencoder).   Related resources Dataset white paper: B. Dai et al. (2026), arXiv:2604.14451. https://arxiv.org/abs/2604.14451 Phase-1 competition (Codabench): https://www.codabench.org/competitions/8934/ Phase-2 competition (Codabench): https://www.codabench.org/competitions/10902/ Challenge GitHub repository (starting kits, baseline code, submission tools): https://github.com/FAIR-Universe/Cosmology_Challenge Challenge website: https://fair-universe.lbl.gov/   Citation If you use this dataset, please cite the following: The white paper: B. Dai et al., FAIR Universe Weak Lensing ML Uncertainty Challenge: Handling Uncertainties and Distribution Shifts for Precision Cosmology, arXiv:2604.14451 [astro-ph.CO]. https://arxiv.org/abs/2604.14451 This Zenodo record (DOI: 10.5281/zenodo.20056065).   Contact For questions about the dataset, contact: fair-universe@lbl.gov
提供机构:
Zenodo
创建时间:
2026-05-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作