five

juliensimon/unified-radio-catalog

收藏
Hugging Face2026-05-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/juliensimon/unified-radio-catalog
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: other license_name: vizier-scientific-use license_link: https://cds.unistra.fr/vizier-org/licences_vizier.html pretty_name: "Unified Radio Catalog (SPECFIND v3)" language: - en description: "SPECFIND v3 unified radio source catalog with 1,658,207 cross-matched measurements across 198 radio surveys including NVSS, FIRST, SUMSS, TGSS, and GLEAM." task_categories: - tabular-classification - tabular-regression tags: - space - radio - nvss - first - sumss - astronomy - open-data - tabular-data size_categories: - 1M<n<10M configs: - config_name: default data_files: - split: train path: data/unified_radio_catalog.parquet default: true --- # Unified Radio Catalog (SPECFIND v3) *Part of the [Astronomy Datasets](https://huggingface.co/collections/juliensimon/astronomy-datasets-69c24caf2f17e36128946743) collection on Hugging Face.* The SPECFIND v3 unified radio source catalog, containing **1,658,207** cross-matched radio source measurements from **198** surveys spanning 16 to 31000 MHz. SPECFIND positionally cross-identifies radio sources across major surveys including NVSS, FIRST, SUMSS, TGSS, GLEAM, and dozens of others, then fits power-law radio spectra. ## Dataset description SPECFIND (Vollmer et al. 2005, updated Stein et al. 2024) is the largest positional cross-identification of radio continuum catalogs. Version 3 matches sources across 50+ radio surveys at frequencies from 16 to 31000 MHz, covering the entire sky. Each row represents a source detection at a specific frequency, grouped by a unique source identifier (`source_id`). For sources detected in multiple surveys, SPECFIND fits a power-law spectrum S(nu) = 10^b * nu^a, where `a` is the spectral index and `b` is the intercept. The catalog contains **339,547** unique radio sources with measurements from surveys including NVSS (1.4 GHz), FIRST (1.4 GHz), SUMSS (843 MHz), TGSS (150 MHz), GLEAM (200 MHz), and many others. ## Key columns | Column | Type | Description | |--------|------|-------------| | `source_id` | Int32 | Unique source identifier (groups cross-matched detections) | | `source_name` | string | Survey-specific source designation | | `ra_deg` | float64 | Right ascension J2000 (degrees) | | `dec_deg` | float64 | Declination J2000 (degrees) | | `frequency_mhz` | float64 | Observation frequency (MHz) | | `flux_density_mjy` | float64 | Flux density at this frequency (mJy) | | `flux_density_error_mjy` | float64 | Flux density uncertainty (mJy) | | `spectral_index` | float64 | Fitted spectral index (a in S ~ nu^a) | | `spectral_intercept` | float64 | Fitted spectral intercept (b in log S = a*log(nu) + b) | | `n_frequencies` | Int32 | Number of frequency measurements for this source | | `flux_residual_pct` | float64 | Flux residual from spectral fit (%) | | `beam_arcsec` | float64 | Survey beam size (arcsec) | | `survey` | string | Survey name extracted from source designation | | `frequency_band` | category | Frequency band: VLF (<100), low (100-500), mid (500-2000), high (2-8 GHz), SHF (>8 GHz) | Full schema includes 16 columns with positional offsets and uncertainties. ## Quick stats - **1,658,207** total source measurements - **339,547** unique radio sources - **198** contributing surveys - Frequency range: 16 to 31000 MHz - Median flux density: 110.0 mJy - Median spectral index: -0.75 ## Usage ```python from datasets import load_dataset ds = load_dataset("juliensimon/unified-radio-catalog", split="train") df = ds.to_pandas() # Group by source to see multi-frequency data source = df[df["source_id"] == df["source_id"].iloc[0]] print(f"Source {source['source_name'].iloc[0]}: {len(source)} frequencies") # Spectral index distribution import matplotlib.pyplot as plt si = df.drop_duplicates("source_id")["spectral_index"].dropna() si.clip(-3, 3).hist(bins=200) plt.xlabel("Spectral index") plt.ylabel("Count") plt.title("Radio Source Spectral Index Distribution") plt.axvline(-0.7, color="red", linestyle="--", label="Typical synchrotron") plt.legend() plt.show() # Sky coverage map plt.hexbin(df["ra_deg"], df["dec_deg"], gridsize=100, mincnt=1) plt.colorbar(label="Measurement count") plt.xlabel("RA (deg)") plt.ylabel("Dec (deg)") plt.title("SPECFIND v3 Sky Coverage") plt.show() # Survey contribution print(df["survey"].value_counts().head(10)) ``` ## Data source Stein, Y., Vollmer, B., Boch, T., et al. (2024), *SPECFIND v3.0 — A catalog of radio continuum cross-identifications and spectra.* VizieR catalog VIII/104. Based on Vollmer, B. et al. (2005, 2010). Via VizieR CDS. ## Related datasets - [NVSS Radio Source Catalog](https://huggingface.co/datasets/juliensimon/nvss-radio-catalog) — NVSS 1.4 GHz survey, 1.8M sources - [FIRST Radio Survey Catalog](https://huggingface.co/datasets/juliensimon/first-radio-catalog) — FIRST 1.4 GHz survey - [VLASS Radio Sources](https://huggingface.co/datasets/juliensimon/vlass-radio-sources) — VLA Sky Survey 2-4 GHz ## Pipeline Source code: [juliensimon/space-datasets](https://github.com/juliensimon/space-datasets) ## Citation ```bibtex @dataset{unified_radio_catalog, author = {Simon, Julien}, title = {Unified Radio Catalog (SPECFIND v3)}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/juliensimon/unified-radio-catalog}, note = {Based on Stein, Vollmer et al. (2024) SPECFIND v3 via VizieR CDS} } ``` ## License [VizieR Scientific-Use Terms](https://cds.unistra.fr/vizier-org/licences_vizier.html)

license: cc-by-4.0 pretty_name: "统一射电源目录(SPECFIND v3)" language: - en description: "SPECFIND v3统一射电源目录,包含来自198个射电巡天(radio survey)的1,658,207条交叉匹配(cross-matched)测量数据,涵盖NVSS、FIRST、SUMSS、TGSS及GLEAM等巡天。" task_categories: - tabular-classification - tabular-regression tags: - space - radio - nvss - first - sumss - astronomy - open-data - tabular-data size_categories: - 1M<n<10M configs: - config_name: default data_files: - split: train path: data/unified_radio_catalog.parquet default: true # 统一射电源目录(SPECFIND v3) *属于Hugging Face平台上的[天文学数据集合集](https://huggingface.co/collections/juliensimon/astronomy-datasets-69c24caf2f17e36128946743)的一部分。* SPECFIND v3统一射电源目录收录了来自198个射电巡天(radio survey)项目、覆盖16至31000 MHz频段的**1,658,207条**交叉匹配射电源测量数据。SPECFIND通过位置信息对多巡天中的射电源进行交叉证认,并拟合得到射电源的幂律谱。 ## 数据集描述 SPECFIND(Vollmer等人,2005年;Stein等人,2024年更新)是目前规模最大的射电连续谱目录位置交叉证认工具。其v3版本匹配了覆盖16至31000 MHz频段的50余个射电巡天(radio survey)中的源,全域覆盖全天空。该数据集每一行代表某一特定频率下的源探测结果,并通过唯一源标识符(`source_id`)进行分组。对于在多个巡天中被探测到的源,SPECFIND会拟合幂律谱S(ν) = 10^b · ν^a,其中`a`为谱指数(spectral index),`b`为截距(intercept)。 本目录共包含**339,547个**独特射电源,其测量数据来自NVSS(1.4 GHz)、FIRST(1.4 GHz)、SUMSS(843 MHz)、TGSS(150 MHz)、GLEAM(200 MHz)等诸多巡天项目。 ## 关键列 | 列名 | 数据类型 | 描述 | |--------|------|-------------| | `source_id` | Int32 | 唯一源标识符(用于分组交叉匹配探测结果) | | `source_name` | 字符串 | 巡天专属源命名 | | `ra_deg` | float64 | J2000坐标系赤经(度) | | `dec_deg` | float64 | J2000坐标系赤纬(度) | | `frequency_mhz` | float64 | 观测频率(MHz) | | `flux_density_mjy` | float64 | 该频率下的流量密度(flux density),单位为毫央斯基(mJy) | | `flux_density_error_mjy` | float64 | 流量密度(flux density)不确定度,单位为毫央斯基(mJy) | | `spectral_index` | float64 | 拟合得到的谱指数(spectral index,即S ~ ν^a中的a) | | `spectral_intercept` | float64 | 拟合得到的谱截距(intercept,即log S = a·log(ν) + b中的b) | | `n_frequencies` | Int32 | 该源的频率测量次数 | | `flux_residual_pct` | float64 | 谱拟合的流量残差(%) | | `beam_arcsec` | float64 | 巡天波束宽度,单位为角秒(arcsec) | | `survey` | 字符串 | 从源命名中提取的巡天名称 | | `frequency_band` | 分类类型 | 频段分类:甚低频(<100)、低频(100-500)、中频(500-2000)、高频(2-8 GHz)、超高频(>8 GHz) | 完整数据集模式共包含16列,额外包含位置偏移量及其不确定度信息。 ## 快速统计 - **1,658,207** 总源测量条目数 - **339,547** 独特射电源总数 - **198** 参与共建的巡天项目数 - 频率覆盖范围:16至31000 MHz - 流量密度(flux density)中位数:110.0 mJy - 谱指数(spectral index)中位数:-0.75 ## 使用示例 python from datasets import load_dataset ds = load_dataset("juliensimon/unified-radio-catalog", split="train") df = ds.to_pandas() # 按源分组以查看多频率数据 source = df[df["source_id"] == df["source_id"].iloc[0]] print(f"源 {source['source_name'].iloc[0]}: 包含 {len(source)} 个频率测量数据") # 谱指数(spectral index)分布 import matplotlib.pyplot as plt si = df.drop_duplicates("source_id")["spectral_index"].dropna() si.clip(-3, 3).hist(bins=200) plt.xlabel("谱指数(spectral index)") plt.ylabel("计数") plt.title("射电源谱指数(spectral index)分布") plt.axvline(-0.7, color="red", linestyle="--", label="典型同步辐射源") plt.legend() plt.show() # 天区覆盖图 plt.hexbin(df["ra_deg"], df["dec_deg"], gridsize=100, mincnt=1) plt.colorbar(label="测量条目数") plt.xlabel("赤经(度)") plt.ylabel("赤纬(度)") plt.title("SPECFIND v3天区覆盖") plt.show() # 巡天贡献度统计 print(df["survey"].value_counts().head(10)) ## 数据来源 Stein, Y., Vollmer, B., Boch, T. 等人(2024),*SPECFIND v3.0 — 射电连续谱交叉证认与谱目录*,VizieR目录VIII/104。基于Vollmer, B. 等人(2005, 2010)的研究成果,通过VizieR CDS获取。 ## 相关数据集 - [NVSS射电源目录](https://huggingface.co/datasets/juliensimon/nvss-radio-catalog) — NVSS 1.4 GHz巡天,包含180万个源 - [FIRST射电巡天目录](https://huggingface.co/datasets/juliensimon/first-radio-catalog) — FIRST 1.4 GHz巡天 - [VLASS射电源](https://huggingface.co/datasets/juliensimon/vlass-radio-sources) — 甚大阵天空巡天(VLA Sky Survey)2-4 GHz数据 ## 处理流程 源代码:[juliensimon/space-datasets](https://github.com/juliensimon/space-datasets) ## 引用格式 bibtex @dataset{unified_radio_catalog, author = {Simon, Julien}, title = {Unified Radio Catalog (SPECFIND v3)}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/juliensimon/unified-radio-catalog}, note = {基于Stein, Vollmer等人(2024)的SPECFIND v3,通过VizieR CDS获取} } ## 许可协议 [知识共享署名4.0国际许可协议(CC-BY-4.0)](https://creativecommons.org/licenses/by/4.0/)
提供机构:
juliensimon
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作