juliensimon/unified-radio-catalog
收藏Hugging Face2026-05-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/juliensimon/unified-radio-catalog
下载链接
链接失效反馈官方服务:
资源简介:
---
license: other
license_name: vizier-scientific-use
license_link: https://cds.unistra.fr/vizier-org/licences_vizier.html
pretty_name: "Unified Radio Catalog (SPECFIND v3)"
language:
- en
description: "SPECFIND v3 unified radio source catalog with 1,658,207 cross-matched measurements across 198 radio surveys including NVSS, FIRST, SUMSS, TGSS, and GLEAM."
task_categories:
- tabular-classification
- tabular-regression
tags:
- space
- radio
- nvss
- first
- sumss
- astronomy
- open-data
- tabular-data
size_categories:
- 1M<n<10M
configs:
- config_name: default
data_files:
- split: train
path: data/unified_radio_catalog.parquet
default: true
---
# Unified Radio Catalog (SPECFIND v3)
*Part of the [Astronomy Datasets](https://huggingface.co/collections/juliensimon/astronomy-datasets-69c24caf2f17e36128946743) collection on Hugging Face.*
The SPECFIND v3 unified radio source catalog, containing **1,658,207** cross-matched radio
source measurements from **198** surveys spanning 16 to 31000 MHz.
SPECFIND positionally cross-identifies radio sources across major surveys including NVSS, FIRST,
SUMSS, TGSS, GLEAM, and dozens of others, then fits power-law radio spectra.
## Dataset description
SPECFIND (Vollmer et al. 2005, updated Stein et al. 2024) is the largest positional cross-identification
of radio continuum catalogs. Version 3 matches sources across 50+ radio surveys at frequencies from
16 to 31000 MHz, covering the entire sky. Each row represents a source detection at a
specific frequency, grouped by a unique source identifier (`source_id`). For sources detected in
multiple surveys, SPECFIND fits a power-law spectrum S(nu) = 10^b * nu^a, where `a` is the spectral
index and `b` is the intercept.
The catalog contains **339,547** unique radio sources with measurements from surveys
including NVSS (1.4 GHz), FIRST (1.4 GHz), SUMSS (843 MHz), TGSS (150 MHz), GLEAM (200 MHz),
and many others.
## Key columns
| Column | Type | Description |
|--------|------|-------------|
| `source_id` | Int32 | Unique source identifier (groups cross-matched detections) |
| `source_name` | string | Survey-specific source designation |
| `ra_deg` | float64 | Right ascension J2000 (degrees) |
| `dec_deg` | float64 | Declination J2000 (degrees) |
| `frequency_mhz` | float64 | Observation frequency (MHz) |
| `flux_density_mjy` | float64 | Flux density at this frequency (mJy) |
| `flux_density_error_mjy` | float64 | Flux density uncertainty (mJy) |
| `spectral_index` | float64 | Fitted spectral index (a in S ~ nu^a) |
| `spectral_intercept` | float64 | Fitted spectral intercept (b in log S = a*log(nu) + b) |
| `n_frequencies` | Int32 | Number of frequency measurements for this source |
| `flux_residual_pct` | float64 | Flux residual from spectral fit (%) |
| `beam_arcsec` | float64 | Survey beam size (arcsec) |
| `survey` | string | Survey name extracted from source designation |
| `frequency_band` | category | Frequency band: VLF (<100), low (100-500), mid (500-2000), high (2-8 GHz), SHF (>8 GHz) |
Full schema includes 16 columns with positional offsets and uncertainties.
## Quick stats
- **1,658,207** total source measurements
- **339,547** unique radio sources
- **198** contributing surveys
- Frequency range: 16 to 31000 MHz
- Median flux density: 110.0 mJy
- Median spectral index: -0.75
## Usage
```python
from datasets import load_dataset
ds = load_dataset("juliensimon/unified-radio-catalog", split="train")
df = ds.to_pandas()
# Group by source to see multi-frequency data
source = df[df["source_id"] == df["source_id"].iloc[0]]
print(f"Source {source['source_name'].iloc[0]}: {len(source)} frequencies")
# Spectral index distribution
import matplotlib.pyplot as plt
si = df.drop_duplicates("source_id")["spectral_index"].dropna()
si.clip(-3, 3).hist(bins=200)
plt.xlabel("Spectral index")
plt.ylabel("Count")
plt.title("Radio Source Spectral Index Distribution")
plt.axvline(-0.7, color="red", linestyle="--", label="Typical synchrotron")
plt.legend()
plt.show()
# Sky coverage map
plt.hexbin(df["ra_deg"], df["dec_deg"], gridsize=100, mincnt=1)
plt.colorbar(label="Measurement count")
plt.xlabel("RA (deg)")
plt.ylabel("Dec (deg)")
plt.title("SPECFIND v3 Sky Coverage")
plt.show()
# Survey contribution
print(df["survey"].value_counts().head(10))
```
## Data source
Stein, Y., Vollmer, B., Boch, T., et al. (2024), *SPECFIND v3.0 — A catalog of radio
continuum cross-identifications and spectra.* VizieR catalog VIII/104.
Based on Vollmer, B. et al. (2005, 2010). Via VizieR CDS.
## Related datasets
- [NVSS Radio Source Catalog](https://huggingface.co/datasets/juliensimon/nvss-radio-catalog) — NVSS 1.4 GHz survey, 1.8M sources
- [FIRST Radio Survey Catalog](https://huggingface.co/datasets/juliensimon/first-radio-catalog) — FIRST 1.4 GHz survey
- [VLASS Radio Sources](https://huggingface.co/datasets/juliensimon/vlass-radio-sources) — VLA Sky Survey 2-4 GHz
## Pipeline
Source code: [juliensimon/space-datasets](https://github.com/juliensimon/space-datasets)
## Citation
```bibtex
@dataset{unified_radio_catalog,
author = {Simon, Julien},
title = {Unified Radio Catalog (SPECFIND v3)},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/juliensimon/unified-radio-catalog},
note = {Based on Stein, Vollmer et al. (2024) SPECFIND v3 via VizieR CDS}
}
```
## License
[VizieR Scientific-Use Terms](https://cds.unistra.fr/vizier-org/licences_vizier.html)
license: cc-by-4.0
pretty_name: "统一射电源目录(SPECFIND v3)"
language:
- en
description: "SPECFIND v3统一射电源目录,包含来自198个射电巡天(radio survey)的1,658,207条交叉匹配(cross-matched)测量数据,涵盖NVSS、FIRST、SUMSS、TGSS及GLEAM等巡天。"
task_categories:
- tabular-classification
- tabular-regression
tags:
- space
- radio
- nvss
- first
- sumss
- astronomy
- open-data
- tabular-data
size_categories:
- 1M<n<10M
configs:
- config_name: default
data_files:
- split: train
path: data/unified_radio_catalog.parquet
default: true
# 统一射电源目录(SPECFIND v3)
*属于Hugging Face平台上的[天文学数据集合集](https://huggingface.co/collections/juliensimon/astronomy-datasets-69c24caf2f17e36128946743)的一部分。*
SPECFIND v3统一射电源目录收录了来自198个射电巡天(radio survey)项目、覆盖16至31000 MHz频段的**1,658,207条**交叉匹配射电源测量数据。SPECFIND通过位置信息对多巡天中的射电源进行交叉证认,并拟合得到射电源的幂律谱。
## 数据集描述
SPECFIND(Vollmer等人,2005年;Stein等人,2024年更新)是目前规模最大的射电连续谱目录位置交叉证认工具。其v3版本匹配了覆盖16至31000 MHz频段的50余个射电巡天(radio survey)中的源,全域覆盖全天空。该数据集每一行代表某一特定频率下的源探测结果,并通过唯一源标识符(`source_id`)进行分组。对于在多个巡天中被探测到的源,SPECFIND会拟合幂律谱S(ν) = 10^b · ν^a,其中`a`为谱指数(spectral index),`b`为截距(intercept)。
本目录共包含**339,547个**独特射电源,其测量数据来自NVSS(1.4 GHz)、FIRST(1.4 GHz)、SUMSS(843 MHz)、TGSS(150 MHz)、GLEAM(200 MHz)等诸多巡天项目。
## 关键列
| 列名 | 数据类型 | 描述 |
|--------|------|-------------|
| `source_id` | Int32 | 唯一源标识符(用于分组交叉匹配探测结果) |
| `source_name` | 字符串 | 巡天专属源命名 |
| `ra_deg` | float64 | J2000坐标系赤经(度) |
| `dec_deg` | float64 | J2000坐标系赤纬(度) |
| `frequency_mhz` | float64 | 观测频率(MHz) |
| `flux_density_mjy` | float64 | 该频率下的流量密度(flux density),单位为毫央斯基(mJy) |
| `flux_density_error_mjy` | float64 | 流量密度(flux density)不确定度,单位为毫央斯基(mJy) |
| `spectral_index` | float64 | 拟合得到的谱指数(spectral index,即S ~ ν^a中的a) |
| `spectral_intercept` | float64 | 拟合得到的谱截距(intercept,即log S = a·log(ν) + b中的b) |
| `n_frequencies` | Int32 | 该源的频率测量次数 |
| `flux_residual_pct` | float64 | 谱拟合的流量残差(%) |
| `beam_arcsec` | float64 | 巡天波束宽度,单位为角秒(arcsec) |
| `survey` | 字符串 | 从源命名中提取的巡天名称 |
| `frequency_band` | 分类类型 | 频段分类:甚低频(<100)、低频(100-500)、中频(500-2000)、高频(2-8 GHz)、超高频(>8 GHz) |
完整数据集模式共包含16列,额外包含位置偏移量及其不确定度信息。
## 快速统计
- **1,658,207** 总源测量条目数
- **339,547** 独特射电源总数
- **198** 参与共建的巡天项目数
- 频率覆盖范围:16至31000 MHz
- 流量密度(flux density)中位数:110.0 mJy
- 谱指数(spectral index)中位数:-0.75
## 使用示例
python
from datasets import load_dataset
ds = load_dataset("juliensimon/unified-radio-catalog", split="train")
df = ds.to_pandas()
# 按源分组以查看多频率数据
source = df[df["source_id"] == df["source_id"].iloc[0]]
print(f"源 {source['source_name'].iloc[0]}: 包含 {len(source)} 个频率测量数据")
# 谱指数(spectral index)分布
import matplotlib.pyplot as plt
si = df.drop_duplicates("source_id")["spectral_index"].dropna()
si.clip(-3, 3).hist(bins=200)
plt.xlabel("谱指数(spectral index)")
plt.ylabel("计数")
plt.title("射电源谱指数(spectral index)分布")
plt.axvline(-0.7, color="red", linestyle="--", label="典型同步辐射源")
plt.legend()
plt.show()
# 天区覆盖图
plt.hexbin(df["ra_deg"], df["dec_deg"], gridsize=100, mincnt=1)
plt.colorbar(label="测量条目数")
plt.xlabel("赤经(度)")
plt.ylabel("赤纬(度)")
plt.title("SPECFIND v3天区覆盖")
plt.show()
# 巡天贡献度统计
print(df["survey"].value_counts().head(10))
## 数据来源
Stein, Y., Vollmer, B., Boch, T. 等人(2024),*SPECFIND v3.0 — 射电连续谱交叉证认与谱目录*,VizieR目录VIII/104。基于Vollmer, B. 等人(2005, 2010)的研究成果,通过VizieR CDS获取。
## 相关数据集
- [NVSS射电源目录](https://huggingface.co/datasets/juliensimon/nvss-radio-catalog) — NVSS 1.4 GHz巡天,包含180万个源
- [FIRST射电巡天目录](https://huggingface.co/datasets/juliensimon/first-radio-catalog) — FIRST 1.4 GHz巡天
- [VLASS射电源](https://huggingface.co/datasets/juliensimon/vlass-radio-sources) — 甚大阵天空巡天(VLA Sky Survey)2-4 GHz数据
## 处理流程
源代码:[juliensimon/space-datasets](https://github.com/juliensimon/space-datasets)
## 引用格式
bibtex
@dataset{unified_radio_catalog,
author = {Simon, Julien},
title = {Unified Radio Catalog (SPECFIND v3)},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/juliensimon/unified-radio-catalog},
note = {基于Stein, Vollmer等人(2024)的SPECFIND v3,通过VizieR CDS获取}
}
## 许可协议
[知识共享署名4.0国际许可协议(CC-BY-4.0)](https://creativecommons.org/licenses/by/4.0/)
提供机构:
juliensimon



