juliensimon/galaxy-zoo-2-morphology
收藏Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/juliensimon/galaxy-zoo-2-morphology
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
pretty_name: "Galaxy Zoo 2 Morphological Classifications"
language:
- en
description: "243,500 citizen-science galaxy morphology classifications from Galaxy Zoo 2 with vote fractions and debiased probabilities for spiral, elliptical, bar, bulge, and merger features."
task_categories:
- tabular-classification
tags:
- space
- galaxies
- morphology
- citizen-science
- galaxy-zoo
- astronomy
- open-data
- sdss
- tabular-data
size_categories:
- 100K<n<1M
configs:
- config_name: default
data_files:
- split: train
path: data/galaxy_zoo_2_morphology.parquet
default: true
---
# Galaxy Zoo 2 Morphological Classifications
*Part of the [Astronomy Datasets](https://huggingface.co/collections/juliensimon/astronomy-datasets-69c24caf2f17e36128946743) collection on Hugging Face.*
**243,500** citizen-science galaxy morphology classifications from Galaxy Zoo 2,
the largest visual morphological classification project in astronomy. Each galaxy was
classified by multiple volunteers answering a decision tree of questions about shape,
structure, and features.
## Dataset description
Galaxy Zoo 2 asked hundreds of thousands of volunteers to classify galaxy images from
the Sloan Digital Sky Survey (SDSS). This dataset contains the spectroscopic-redshift
sample (Table 5 from Willett et al. 2013): **243,500** galaxies with vote counts,
vote fractions, weighted fractions, debiased probabilities, and classification flags
for 11 morphological tasks spanning 37 possible answers.
The decision tree covers: smooth vs. featured, edge-on disk, bar presence, spiral
structure, bulge prominence, oddities (ring, lens, disturbed, irregular, merger, dust
lane), roundedness, bulge shape, and spiral arm properties (tightness, count).
## Quick stats
- **243,500** galaxies classified
- **103,515** classified as smooth/elliptical
- **139,738** classified as featured/disk
- **86,775** with spiral structure (debiased probability > 0.5)
- **34,286** barred galaxies (debiased probability > 0.5)
- **29,597** edge-on galaxies (debiased probability > 0.5)
- **179.4** average votes per galaxy
## Schema
The dataset has 237 columns. Key columns:
| Column | Type | Description |
|--------|------|-------------|
| `specobjid` | int64 | SDSS spectroscopic object ID |
| `dr8objid` | int64 | SDSS DR8 photometric object ID |
| `dr7objid` | int64 | SDSS DR7 photometric object ID |
| `ra` | float64 | Right Ascension (J2000, degrees) |
| `dec` | float64 | Declination (J2000, degrees) |
| `rastring` | string | RA as sexagesimal string |
| `decstring` | string | Dec as sexagesimal string |
| `sample` | string | Sample membership flag |
| `gz2class` | string | Summary morphological class |
| `total_classifications` | int64 | Total number of classifications |
| `total_votes` | int64 | Total number of votes |
| `dominant_morphology` | string | Derived: highest debiased probability (smooth / features_or_disk / star_or_artifact) |
| `is_barred` | bool | Derived: bar debiased probability > 0.5 |
| `is_spiral` | bool | Derived: spiral debiased probability > 0.5 |
| `is_edge_on` | bool | Derived: edge-on debiased probability > 0.5 |
For each of the 11 morphological tasks (t01-t11) and their answers (a01-a37), there are up to 6 columns:
| Suffix | Description |
|--------|-------------|
| `_count` | Raw number of votes for this answer |
| `_weight` | Weighted vote count (correcting for classifier consistency) |
| `_fraction` | Simple vote fraction |
| `_weighted_fraction` | Weighted vote fraction |
| `_debiased` | Debiased probability (corrected for redshift-dependent bias) |
| `_flag` | Classification flag (1 = plurality answer after debiasing) |
## Usage
```python
from datasets import load_dataset
ds = load_dataset("juliensimon/galaxy-zoo-2-morphology", split="train")
df = ds.to_pandas()
# Elliptical galaxies (smooth, debiased probability > 0.8)
ellipticals = df[df["t01_smooth_or_features_a01_smooth_debiased"] > 0.8]
# Barred spiral galaxies
barred_spirals = df[df["is_barred"] & df["is_spiral"]]
# Edge-on disks
edge_on = df[df["is_edge_on"]]
# Distribution of morphological classes
print(df["gz2class"].value_counts().head(10))
# Merger candidates (odd feature = merger, debiased > 0.5)
if "t08_odd_feature_a24_merger_debiased" in df.columns:
mergers = df[df["t08_odd_feature_a24_merger_debiased"] > 0.5]
```
## Data source
[Galaxy Zoo 2](https://data.galaxyzoo.org/) — Willett et al. (2013),
"Galaxy Zoo 2: detailed morphological classifications for 304,122 galaxies from the
Sloan Digital Sky Survey", *MNRAS*, 435, 2835.
[arXiv:1308.3496](https://arxiv.org/abs/1308.3496)
This table is the spectroscopic-redshift subsample (Table 5).
## Related datasets
- [open-ngc](https://huggingface.co/datasets/juliensimon/open-ngc) — NGC/IC galaxy and nebula catalog
- [exoplanets](https://huggingface.co/datasets/juliensimon/exoplanets) — NASA Exoplanet Archive
- [messier-objects](https://huggingface.co/datasets/juliensimon/messier-objects) — Messier catalog of deep-sky objects
## Pipeline
Source code: [juliensimon/space-datasets](https://github.com/juliensimon/space-datasets)
## Citation
```bibtex
@dataset{galaxy_zoo_2_morphology,
author = {Simon, Julien},
title = {Galaxy Zoo 2 Morphological Classifications},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/juliensimon/galaxy-zoo-2-morphology},
note = {Based on Galaxy Zoo 2 data (Willett et al. 2013, MNRAS 435, 2835)}
}
```
## License
[CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/)
提供机构:
juliensimon



