BigBang/galaxyzoo-decals
收藏Hugging Face2022-08-29 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/BigBang/galaxyzoo-decals
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
---
# Galaxy Zoo DECaLS: Detailed Visual Morphology Measurements from Volunteers and Deep Learning for 314,000 Galaxies
- https://github.com/mwalmsley/zoobot
- https://zenodo.org/record/4573248
# Dataset Schema
This schema describes the columns in the GZ DECaLS catalogues; `gz_decals_auto_posteriors`, `gz_decals_volunteers_1_and_2`, and `gz_decals_volunteers_5`.
In all catalogues, galaxies are identified by their `iauname`. Galaxies are unique within a catalogue. `gz_decals_auto_posteriors` contains all galaxies with appropriate imaging and photometry in DECaLS DR5, while `gz_decals_volunteers_1_and_2`, and `gz_decals_volunteers_5` contain subsets classified by volunteers in the respective campaigns.
The columns reporting morphology measurements are named like `{some-question}_{an-answer}`. For example, for the first question, both volunteer catalogues include the following:
| Column | Description |
| ----------- | ----------- |
| smooth-or-featured_total | Total number of volunteers who answered the "Smooth of Featured" question |
| smooth-or-featured_smooth | Count of volunteers who responded "Smooth" to the "Smooth or Featured" question |
| smooth-or-featured_featured-or-disk | Count of volunteers who responded "Featured or Disk", similarly |
| smooth-or-featured_artifact | Count of volunteers who responded "Artifact", similarly |
| smooth-or-featured_smooth_fraction | Fraction of volunteers who responded "Smooth" to the "Smooth or Featured" question, out of all respondes (i.e. smooth count / total) |
| smooth-or-featured_featured-or-disk_fraction | Fraction of volunteers who responded "Featured or Disk", similarly |
| smooth-or-featured_artifact_fraction | Fraction of volunteers who responded "Artifact", similarly |
The questions and answers are slightly different for `gz_decals_volunteers_1_and_2` than `gz_decals_volunteers_5`. See the paper for more.
The volunteer catalogues include `{question}_{answer}_debiased` columns which attempt to estimate what the vote fractions would be if the same galaxy were imaged at lower redshift. See the paper for more. Note that the debiased measurements are highly uncertain on an individual galaxy basis and therefore should be used with caution. Debiased estimates are only available for galaxies with 0.02<z<0.15, -21.5>M_r>-23, and at least 30 votes for the first question (`Smooth or Featured') after volunteer weighting.
The automated catalogue, `gz_decals_auto_posteriors`, includes predictions for all galaxies and all questions even when that question may not be appropriate (e.g. number of spiral arms for a smooth elliptical). To assess relevance, we include `{question}_proportion_volunteers_asked` columns showing the estimated fraction of volunteers that would have been asked each question (i.e. the product of the vote fractions for the preceding answers). We suggest a cut of `{question}_proportion_volunteers_asked` > 0.5 as a starting point.
The automated catalogue does not include volunteer counts or totals (naturally).
Each catalogue includes a pair of columns to warn where galaxies may have been classified using an inappropriately large field-of-view (due to incorrect radii measurements in the NSA, on which the field-of-view is calculated). We suggest excluding galaxies (<1%) with such warnings.
| Column | Description |
| ----------- | ----------- |
| wrong_size_statistic | Mean distance from center of all pixels above double the 20th percentile (i.e. probable source pixels) |
| wrong_size_warning | True if wrong_size_statistic > 161.0, our suggested starting cut. Approximately the mean distance of all pixels from center|
For convenience, each catalogue includes the same set of basic astrophysical measurements copied from the NASA Sloan Atlas (NSA). Additional measurements can be added my crossmatching on `iauname` with the NSA. See [here](https://data.sdss.org/datamodel/files/ATLAS_DATA/ATLAS_MAJOR_VERSION/nsa.html) for the NSA schema. If you use these columns, you should cite the NSA.
| Column | Description |
| ----------- | ----------- |
| ra | Right ascension (degrees) |
| dec | Declination (degrees) |
| iauname | Unique identifier listed in NSA v1.0.1 |
| petro_theta | "Azimuthally-averaged SDSS-style Petrosian radius (derived from r band" |
| petro_th50 | "Azimuthally-averaged SDSS-style 50% light radius (r-band)" |
| petro_th90 | "Azimuthally-averaged SDSS-style 50% light radius (r-band)" |
| elpetro_absmag_r | "Absolute magnitude from elliptical Petrosian fluxes in rest-frame" in SDSS r |
| sersic_nmgy_r | "Galactic-extinction corrected AB flux" in SDSS r |
| redshift | "Heliocentric redshift" ("z" column in NSA) |
| mag_r | 22.5 - 2.5 log10(sersic_nmgy_r). *Not* the same as the NSA mag column! |
```
@dataset{walmsley_mike_2020_4573248,
author = {Walmsley, Mike and
Lintott, Chris and
Tobias, Geron and
Kruk, Sandor J and
Krawczyk, Coleman and
Willett, Kyle and
Bamford, Steven and
Kelvin, Lee S and
Fortson, Lucy and
Gal, Yarin and
Keel, William and
Masters, Karen and
Mehta, Vihang and
Simmons, Brooke and
Smethurst, Rebecca J and
Smith, Lewis and
Baeten, Elisabeth M L and
Macmillan, Christine},
title = {{Galaxy Zoo DECaLS: Detailed Visual Morphology
Measurements from Volunteers and Deep Learning for
314,000 Galaxies}},
month = dec,
year = 2020,
publisher = {Zenodo},
version = {0.0.2},
doi = {10.5281/zenodo.4573248},
url = {https://doi.org/10.5281/zenodo.4573248}
}
```
---
许可证:CC BY 4.0
---
# 星系动物园DECaLS:基于志愿者与深度学习的31.4万个星系精细视觉形态测量数据集
- https://github.com/mwalmsley/zoobot
- https://zenodo.org/record/4573248
# 数据集模式
本模式描述了GZ DECaLS目录的字段结构,涉及`gz_decals_auto_posteriors`、`gz_decals_volunteers_1_and_2`与`gz_decals_volunteers_5`三个目录。
所有目录中,星系均通过`iauname`进行唯一标识,且同一目录内星系无重复。`gz_decals_auto_posteriors`包含DECaLS DR5中所有具备合格成像与测光数据的星系,而`gz_decals_volunteers_1_and_2`与`gz_decals_volunteers_5`则为对应志愿者分类活动中的星系子集。
形态测量字段的命名格式为`{问题标识}_{答案标识}`。例如在首个分类问题中,两个志愿者目录均包含以下字段:
| 字段名 | 描述 |
| ------ | ---- |
| smooth-or-featured_total | 参与"平滑或带特征"问题作答的志愿者总人数 |
| smooth-or-featured_smooth | 选择"平滑"作答的志愿者人数 |
| smooth-or-featured_featured-or-disk | 选择"带特征或盘状"作答的志愿者人数 |
| smooth-or-featured_artifact | 选择"伪影"作答的志愿者人数 |
| smooth-or-featured_smooth_fraction | 选择"平滑"的志愿者占所有有效作答者的比例(即平滑作答人数/总作答人数) |
| smooth-or-featured_featured-or-disk_fraction | 选择"带特征或盘状"的志愿者占比 |
| smooth-or-featured_artifact_fraction | 选择"伪影"的志愿者占比 |
`gz_decals_volunteers_1_and_2`与`gz_decals_volunteers_5`的问题设置与答案选项略有差异,详细信息请参阅相关研究论文。
志愿者目录包含`{问题}_{答案}_debiased`类字段,用于估算同一星系在更低红移下的投票分数分布,详细信息请参阅相关论文。需注意,单个星系的去偏后测量结果不确定性极高,因此使用时需格外谨慎。仅当星系满足以下全部条件时,方可获取去偏估计值:0.02<z<0.15、-21.5>M_r>-23,且经志愿者权重计算后,首个问题("平滑或带特征")的有效投票数不少于30。
自动目录`gz_decals_auto_posteriors`为所有星系与所有分类问题提供预测结果,即便该问题可能并不适用于对应星系(例如为平滑椭圆星系预测螺旋臂数量)。为评估问题适用性,我们新增了`{question}_proportion_volunteers_asked`字段,该字段表示预计会被问及该问题的志愿者占比(即前序问题各答案投票分数的乘积)。我们建议以`{question}_proportion_volunteers_asked`>0.5作为初步筛选阈值。
自动目录不包含志愿者作答人数与总人数统计,此为设计使然。
每个目录均包含一对字段,用于标记那些因使用过大视场而可能被不恰当分类的星系(该问题源于NASA斯隆星表(NSA)中错误的半径测量,而视场正是基于此计算得到)。我们建议排除约占比<1%的带有此类标记的星系。
| 字段名 | 描述 |
| ------ | ---- |
| wrong_size_statistic | 所有亮度超过20%百分位两倍的像素到星系中心的平均距离(即疑似源像素) |
| wrong_size_warning | 若`wrong_size_statistic`>161.0(我们建议的初步筛选阈值),则该字段为真;该值近似为所有像素到星系中心的平均距离 |
为方便使用,每个目录均包含一组从NASA斯隆星表(NSA)中复制的基础天体物理测量字段。可通过`iauname`与NSA进行交叉匹配以获取更多测量数据。NSA的模式详情请参阅[此处](https://data.sdss.org/datamodel/files/ATLAS_DATA/ATLAS_MAJOR_VERSION/nsa.html)。若使用此类字段,请务必引用NSA相关文献。
| 字段名 | 描述 |
| ------ | ---- |
| ra | 赤经(单位:度) |
| dec | 赤纬(单位:度) |
| iauname | NSA v1.0.1中收录的唯一标识符 |
| petro_theta | "基于r波段的方位平均SDSS风格彼得森半径" |
| petro_th50 | "基于r波段的方位平均SDSS风格50%光半径" |
| petro_th90 | "基于r波段的方位平均SDSS风格50%光半径" |
| elpetro_absmag_r | "基于静止帧SDSS r波段椭圆彼得森通量得到的绝对星等" |
| sersic_nmgy_r | "SDSS r波段经过银河消光校正的AB通量" |
| redshift | "日心红移"(即NSA中的"z"字段) |
| mag_r | 22.5 - 2.5 log10(sersic_nmgy_r)。**注意:该字段与NSA中的mag列并不相同!** |
@dataset{walmsley_mike_2020_4573248,
author = {Walmsley, Mike and
Lintott, Chris and
Tobias, Geron and
Kruk, Sandor J and
Krawczyk, Coleman and
Willett, Kyle and
Bamford, Steven and
Kelvin, Lee S and
Fortson, Lucy and
Gal, Yarin and
Keel, William and
Masters, Karen and
Mehta, Vihang and
Simmons, Brooke and
Smethurst, Rebecca J and
Smith, Lewis and
Baeten, Elisabeth M L and
Macmillan, Christine},
title = {{Galaxy Zoo DECaLS: Detailed Visual Morphology
Measurements from Volunteers and Deep Learning for
314,000 Galaxies}},
month = dec,
year = 2020,
publisher = {Zenodo},
version = {0.0.2},
doi = {10.5281/zenodo.4573248},
url = {https://doi.org/10.5281/zenodo.4573248}
}
提供机构:
BigBang
原始信息汇总
数据集概述
数据集名称
Galaxy Zoo DECaLS: Detailed Visual Morphology Measurements from Volunteers and Deep Learning for 314,000 Galaxies
数据集内容
该数据集包含三个主要目录:
gz_decals_auto_posteriors: 包含所有在DECaLS DR5中具有适当成像和光度的星系。gz_decals_volunteers_1_and_2: 包含由志愿者在特定活动中分类的星系子集。gz_decals_volunteers_5: 同样包含由志愿者分类的星系子集。
数据集结构
- 星系标识:所有目录中的星系通过
iauname唯一标识。 - 形态测量列:列名格式为
{some-question}_{an-answer},例如smooth-or-featured_total。 - 志愿者投票统计:包括总投票数、特定答案的投票数及投票比例。
- 自动分类结果:
gz_decals_auto_posteriors包含所有星系的所有问题预测,即使某些问题可能不适用于特定星系。 - 警告列:包括可能因不正确半径测量而导致的大视场分类警告。
- 基本天体物理测量:从NASA Sloan Atlas复制的基本测量,如赤经、赤纬、绝对星等等。
数据集使用注意事项
- 对于
gz_decals_volunteers_1_and_2和gz_decals_volunteers_5,形态测量的问题和答案略有不同。 - 偏差校正测量仅适用于特定条件下的星系,且应谨慎使用。
- 自动分类目录不包含志愿者计数或总数。
- 建议排除具有大视场分类警告的星系。
数据集许可证
CC-BY-4.0
搜集汇总
数据集介绍

构建方式
Galaxy Zoo DECaLS数据集通过结合志愿者的人工分类与深度学习技术,对DECaLS DR5中的314,000个星系进行了详细视觉形态测量。数据集分为三个目录,其中`gz_decals_auto_posteriors`包含所有具备适当成像和测光数据的星系,而`gz_decals_volunteers_1_and_2`与`gz_decals_volunteers_5`则是志愿者在相应活动中分类的子集。
特点
该数据集的特点在于融合了人类主观判断与机器学习预测的优势,涵盖了星系形态的各种测量指标,如平滑或特征明显的星系数量、比例等。数据集还提供了对低红移情况下投票比例的估计,以及基于NASA Sloan Atlas的基本天体物理测量。需要注意的是,对单个星系的偏差校正估计具有高度不确定性。
使用方法
使用该数据集时,研究者应首先关注`{question}_proportion_volunteers_asked`列以评估问题的相关性,并以大于0.5的比例作为起点。同时,数据集建议排除那些由于视场大小测量错误而可能被不恰当分类的星系。对于天体物理测量的使用,需要与NASA Sloan Atlas的数据模型进行匹配,并在引用时注明来源。
背景与挑战
背景概述
Galaxy Zoo DECaLS数据集,作为天文学领域的一项重要研究成果,由Mike Walmsley等众多研究人员共同打造,并于2020年通过Zenodo发布。该数据集凝聚了志愿者与深度学习相结合的力量,对314,000个星系的详细视觉形态进行了测量。其研究背景源于对星系形态学的深入探索,旨在通过对星系形态的细致分类,增进我们对宇宙结构的理解。数据集的构建不仅推动了天文学研究的发展,也为公众参与科学研究提供了平台,展现了科学研究的开放性与包容性。
当前挑战
该数据集在研究领域中面临的挑战主要包括:首先,星系形态分类的复杂性导致了人工分类的主观性和不确定性;其次,自动化分类方法的准确性依赖于深度学习模型的训练质量和数据标注的一致性。在构建过程中,数据集还面临了如何整合志愿者分类结果与自动化测量结果的问题,以及如何处理由于视场过大或测量错误导致的分类偏差等挑战。这些挑战要求研究人员在数据处理和分析时,采取更为精细和严谨的方法,以确保研究结果的可靠性和有效性。
常用场景
经典使用场景
在探索宇宙奥秘的征途中,Galaxy Zoo DECaLS数据集以其独特的形态测量数据,成为天文学研究的重要资源。该数据集最经典的使用场景在于,它结合了志愿者的人工分类与深度学习的自动识别技术,对314,000个星系进行详细的视觉形态分类,为研究者提供了一个强有力的工具,以探索星系的形态与其物理属性之间的关联。
实际应用
实际应用中,Galaxy Zoo DECaLS数据集不仅为天文学研究提供了丰富的数据资源,其研究成果也被广泛应用于宇宙学、星系动力学以及星系聚类等领域,为揭示宇宙结构和大尺度结构的形成提供了重要依据。
衍生相关工作
基于Galaxy Zoo DECaLS数据集的研究衍生出了众多相关的工作,如对星系形态与宇宙学参数之间的关系进行探究,以及利用该数据集对星系聚类环境进行深入分析。这些相关工作进一步拓展了我们对宇宙的认识边界,推动了天文学及相关领域的发展。
以上内容由遇见数据集搜集并总结生成



