five

juliensimon/galaxy-zoo-2-morphology

收藏
Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/juliensimon/galaxy-zoo-2-morphology
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 pretty_name: "Galaxy Zoo 2 Morphological Classifications" language: - en description: "243,500 citizen-science galaxy morphology classifications from Galaxy Zoo 2 with vote fractions and debiased probabilities for spiral, elliptical, bar, bulge, and merger features." task_categories: - tabular-classification tags: - space - galaxies - morphology - citizen-science - galaxy-zoo - astronomy - open-data - sdss - tabular-data size_categories: - 100K<n<1M configs: - config_name: default data_files: - split: train path: data/galaxy_zoo_2_morphology.parquet default: true --- # Galaxy Zoo 2 Morphological Classifications *Part of the [Astronomy Datasets](https://huggingface.co/collections/juliensimon/astronomy-datasets-69c24caf2f17e36128946743) collection on Hugging Face.* **243,500** citizen-science galaxy morphology classifications from Galaxy Zoo 2, the largest visual morphological classification project in astronomy. Each galaxy was classified by multiple volunteers answering a decision tree of questions about shape, structure, and features. ## Dataset description Galaxy Zoo 2 asked hundreds of thousands of volunteers to classify galaxy images from the Sloan Digital Sky Survey (SDSS). This dataset contains the spectroscopic-redshift sample (Table 5 from Willett et al. 2013): **243,500** galaxies with vote counts, vote fractions, weighted fractions, debiased probabilities, and classification flags for 11 morphological tasks spanning 37 possible answers. The decision tree covers: smooth vs. featured, edge-on disk, bar presence, spiral structure, bulge prominence, oddities (ring, lens, disturbed, irregular, merger, dust lane), roundedness, bulge shape, and spiral arm properties (tightness, count). ## Quick stats - **243,500** galaxies classified - **103,515** classified as smooth/elliptical - **139,738** classified as featured/disk - **86,775** with spiral structure (debiased probability > 0.5) - **34,286** barred galaxies (debiased probability > 0.5) - **29,597** edge-on galaxies (debiased probability > 0.5) - **179.4** average votes per galaxy ## Schema The dataset has 237 columns. Key columns: | Column | Type | Description | |--------|------|-------------| | `specobjid` | int64 | SDSS spectroscopic object ID | | `dr8objid` | int64 | SDSS DR8 photometric object ID | | `dr7objid` | int64 | SDSS DR7 photometric object ID | | `ra` | float64 | Right Ascension (J2000, degrees) | | `dec` | float64 | Declination (J2000, degrees) | | `rastring` | string | RA as sexagesimal string | | `decstring` | string | Dec as sexagesimal string | | `sample` | string | Sample membership flag | | `gz2class` | string | Summary morphological class | | `total_classifications` | int64 | Total number of classifications | | `total_votes` | int64 | Total number of votes | | `dominant_morphology` | string | Derived: highest debiased probability (smooth / features_or_disk / star_or_artifact) | | `is_barred` | bool | Derived: bar debiased probability > 0.5 | | `is_spiral` | bool | Derived: spiral debiased probability > 0.5 | | `is_edge_on` | bool | Derived: edge-on debiased probability > 0.5 | For each of the 11 morphological tasks (t01-t11) and their answers (a01-a37), there are up to 6 columns: | Suffix | Description | |--------|-------------| | `_count` | Raw number of votes for this answer | | `_weight` | Weighted vote count (correcting for classifier consistency) | | `_fraction` | Simple vote fraction | | `_weighted_fraction` | Weighted vote fraction | | `_debiased` | Debiased probability (corrected for redshift-dependent bias) | | `_flag` | Classification flag (1 = plurality answer after debiasing) | ## Usage ```python from datasets import load_dataset ds = load_dataset("juliensimon/galaxy-zoo-2-morphology", split="train") df = ds.to_pandas() # Elliptical galaxies (smooth, debiased probability > 0.8) ellipticals = df[df["t01_smooth_or_features_a01_smooth_debiased"] > 0.8] # Barred spiral galaxies barred_spirals = df[df["is_barred"] & df["is_spiral"]] # Edge-on disks edge_on = df[df["is_edge_on"]] # Distribution of morphological classes print(df["gz2class"].value_counts().head(10)) # Merger candidates (odd feature = merger, debiased > 0.5) if "t08_odd_feature_a24_merger_debiased" in df.columns: mergers = df[df["t08_odd_feature_a24_merger_debiased"] > 0.5] ``` ## Data source [Galaxy Zoo 2](https://data.galaxyzoo.org/) — Willett et al. (2013), "Galaxy Zoo 2: detailed morphological classifications for 304,122 galaxies from the Sloan Digital Sky Survey", *MNRAS*, 435, 2835. [arXiv:1308.3496](https://arxiv.org/abs/1308.3496) This table is the spectroscopic-redshift subsample (Table 5). ## Related datasets - [open-ngc](https://huggingface.co/datasets/juliensimon/open-ngc) — NGC/IC galaxy and nebula catalog - [exoplanets](https://huggingface.co/datasets/juliensimon/exoplanets) — NASA Exoplanet Archive - [messier-objects](https://huggingface.co/datasets/juliensimon/messier-objects) — Messier catalog of deep-sky objects ## Pipeline Source code: [juliensimon/space-datasets](https://github.com/juliensimon/space-datasets) ## Citation ```bibtex @dataset{galaxy_zoo_2_morphology, author = {Simon, Julien}, title = {Galaxy Zoo 2 Morphological Classifications}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/juliensimon/galaxy-zoo-2-morphology}, note = {Based on Galaxy Zoo 2 data (Willett et al. 2013, MNRAS 435, 2835)} } ``` ## License [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/)
提供机构:
juliensimon
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作