five

UniverseTBD/mmu_desi_edr_sv3

收藏
Hugging Face2026-04-21 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/UniverseTBD/mmu_desi_edr_sv3
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: default data_dir: mmu_desi_edr_sv3/dataset tags: - astronomy license: cc-by-4.0 pretty_name: mmu_desi_edr_sv3 size_categories: - 1M<n<10M --- <div align="center"> <img src="example_figure.png" width="600"> </div> # mmu_desi_edr_sv3 HATS Catalog Collection This is the collection of HATS catalogs representing mmu_desi_edr_sv3. This dataset is part of the [Multimodal Universe](https://github.com/MultimodalUniverse/MultimodalUniverse), a large-scale collection of multimodal astronomical data. For full details, see the paper: [The Multimodal Universe: Enabling Large-Scale Machine Learning with 100TBs of Astronomical Scientific Data](https://arxiv.org/abs/2412.02527). ### Access the catalog We recommend the use of the [LSDB](https://lsdb.io) Python framework to access HATS catalogs. LSDB can be installed via `pip install lsdb` or `conda install conda-forge::lsdb`, see more details [in the docs](https://docs.lsdb.io/). The following code provides a minimal example of opening this catalog: ```python import lsdb # Full sky coverage. catalog = lsdb.open_catalog("https://huggingface.co/datasets/UniverseTBD/mmu_desi_edr_sv3") # One-degree cone. catalog = lsdb.open_catalog( "https://huggingface.co/datasets/UniverseTBD/mmu_desi_edr_sv3", search_filter=lsdb.ConeSearch(ra=247.0, dec=43.0, radius_arcsec=3600.0), ) ``` Each catalog in this collection is represented as a separate [Apache Parquet dataset](https://arrow.apache.org/docs/python/dataset.html) and can be accessed with a variety of tools, including `pandas`, `pyarrow`, `dask`, `Spark`, `DuckDB`. ### File structure This catalog is represented by the following files and directories: - [`collection.properties`](https://huggingface.co/datasets/UniverseTBD/mmu_desi_edr_sv3/collection.properties) — textual metadata file describing the HATS collection of catalogs - [`mmu_desi_edr_sv3`](https://huggingface.co/datasets/UniverseTBD/mmu_desi_edr_sv3/mmu_desi_edr_sv3) — main HATS catalog directory - [`dataset/`](https://huggingface.co/datasets/UniverseTBD/mmu_desi_edr_sv3/mmu_desi_edr_sv3/dataset/) — Apache Parquet dataset directory for the main catalog - ... parquet metadata and data files in sub directories ... - [`hats.properties`](https://huggingface.co/datasets/UniverseTBD/mmu_desi_edr_sv3/mmu_desi_edr_sv3/hats.properties) — textual metadata file describing the main HATS catalog - [`partition_info.csv`](https://huggingface.co/datasets/UniverseTBD/mmu_desi_edr_sv3/mmu_desi_edr_sv3/partition_info.csv) — CSV file with a list of catalog HEALPix tiles (catalog partitions) - [`skymap.fits`](https://huggingface.co/datasets/UniverseTBD/mmu_desi_edr_sv3/mmu_desi_edr_sv3/skymap.fits) — HEALPix skymap FITS file with row-counts per HEALPix tile of fixed order 10 - [`mmu_desi_edr_sv3_10arcs/`](https://huggingface.co/datasets/UniverseTBD/mmu_desi_edr_sv3/mmu_desi_edr_sv3_10arcs) — default margin catalog to ensure data completeness in cross-matching, the margin threshold is 10.0 arcseconds - ... margin catalog files and directories ... ### Catalog metadata Metadata of the main HATS catalog, excluding margins and indexes: | **Number of rows** | **Number of columns** | **Number of partitions** | **Size on disk** | **HATS Builder** | | --- | --- | --- | --- | --- | | 1,126,441 | 20 | 306 | 61.8 GiB | hats-import v0.7.1, hats v0.7.1 | ### Catalog columns The main HATS catalog contains the following columns: | **Name** | **`_healpix_29`** | **`spectrum.flux`** | **`spectrum.ivar`** | **`spectrum.lsf_sigma`** | **`spectrum.lambda`** | **`spectrum.mask`** | **`Z`** | **`ZERR`** | **`EBV`** | **`FLUX_G`** | **`FLUX_R`** | **`FLUX_Z`** | **`FLUX_IVAR_G`** | **`FLUX_IVAR_R`** | **`FLUX_IVAR_Z`** | **`FIBERFLUX_G`** | **`FIBERFLUX_R`** | **`FIBERFLUX_Z`** | **`FIBERTOTFLUX_G`** | **`FIBERTOTFLUX_R`** | **`FIBERTOTFLUX_Z`** | **`ra`** | **`dec`** | **`ZWARN`** | **`object_id`** | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | **Data Type** | int64 | list[float] | list[float] | list[float] | list[float] | list[bool] | float | float | float | float | float | float | float | float | float | float | float | float | float | float | float | double | double | bool | string | | **Nested?** | — | spectrum | spectrum | spectrum | spectrum | spectrum | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | | **Value count** | 1,126,441 | 8,764,837,421 | 8,764,837,421 | 8,764,837,421 | 8,764,837,421 | 8,764,837,421 | 1,126,441 | 1,126,441 | 1,126,441 | 1,126,441 | 1,126,441 | 1,126,441 | 1,126,441 | 1,126,441 | 1,126,441 | 1,126,441 | 1,126,441 | 1,126,441 | 1,126,441 | 1,126,441 | 1,126,441 | 1,126,441 | 1,126,441 | 1,126,441 | 1,126,441 | | **Example row** | 690723642961871770 | [19.97, -4.157, 25.57, -35.53, … (7781 total)] | [0.003643, 0.002593, 0.002245, … (7781 total)] | [0.8615, 0.8615, 0.8615, 0.8615, … (7781 total)] | [3600, 3601, 3602, 3602, 3603, … (7781 total)] | [False, False, False, False, … (7781 total)] | 0.1858 | 5.029e-05 | 0.01178 | 17.86 | 53.7 | 104.7 | 91.37 | 30.44 | 17.35 | 5.629 | 16.93 | 33.01 | 5.629 | 16.93 | 33.01 | 246.7 | 42.85 | False | 39633127718519967 | | **Minimum value** | 643519964553769984 | -2392754.75 | -0.0 | 0.8452407717704773 | 3600.0 | False | -0.004999999888241291 | -0.0 | -0.0 | -99.0 | -99.0 | -99.0 | -99.0 | -99.0 | -99.0 | -0.0 | -0.0 | -0.0 | -0.0 | -0.0 | -0.0 | 148.36930329196142 | -2.4457088512932574 | False | 1000347531214848 | | **Maximum value** | 1981012038139459733 | 371762.1875 | 609.637451171875 | 0.8992781639099121 | 9824.0 | True | 5.985368251800537 | 0.007868182845413685 | 0.09311211854219437 | 8527.033203125 | 14788.431640625 | 27467.138671875 | 3400.1640625 | 1222.0245361328125 | 352.811279296875 | 1040.572265625 | 990.2592163085938 | 3370.12158203125 | 1969.8492431640625 | 993.0661010742188 | 3371.087158203125 | 273.9542154942551 | 67.87875977774526 | True | 999860421525504 | "Nested" indicates whether the column is stored as a nested field inside another "struct" column. "Value count" may be different from the total number of rows for nested columns: each nested element is counted as a single value. ### Crossmatch with another catalog HATS catalogs can be efficiently crossmatched using [LSDB](https://lsdb.io), which leverages the HEALPix partitioning to avoid loading the full datasets into memory: ```python import lsdb mmu_desi_edr_sv3 = lsdb.open_catalog("https://huggingface.co/datasets/UniverseTBD/mmu_desi_edr_sv3") other = lsdb.open_catalog("https://huggingface.co/datasets/<org>/<other_catalog>") crossmatched = mmu_desi_edr_sv3.crossmatch(other, radius_arcsec=1.0) print(crossmatched) ``` See the [LSDB documentation](https://docs.lsdb.io/) for more details on crossmatching and other operations. ### Dataset-specific context **Original survey** This dataset is based on the [Dark Energy Spectroscopic Instrument (DESI)](https://www.desi.lbl.gov/), specifically the Early Data Release (EDR), which represents about 1% of the final survey. DESI collects spectra of millions of galaxies, quasars, and stars to measure the effect of dark energy on the expansion of the universe. **Data modality** The dataset consists of spectra with a fixed wavelength range from 3,600 to 9,800 and 7,081 pixels per sample. Each spectrum includes flux, wavelength, and inverse variance (ivar). **Typical use cases** The dataset can be used to analyze spectra of galaxies, quasars, and stars. Existing applications include multimodal representation learning (AstroCLIP) and outlier detection using spectrum auto-encoding models. **Caveats** The dataset is based on the DESI Early Data Release (EDR), which represents 1% of the final survey. It includes only primary spectra for each object, targets (excluding sky and other object types), and fibers with good status. **Citation** Users should cite the DESI Early Data Release and acknowledge the DESI collaboration. The dataset is released under the CC BY 4.0 license, requiring attribution to the original authors.
提供机构:
UniverseTBD
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作