five

juliensimon/sdss-asteroid-taxonomy

收藏
Hugging Face2026-03-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/juliensimon/sdss-asteroid-taxonomy
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 pretty_name: "SDSS-based Asteroid Taxonomy" language: - en description: "Compositional taxonomy for 107,466 SDSS observations of 63,468 asteroids, with ugriz reflectances and orbital elements." task_categories: - tabular-classification - tabular-regression tags: - space - asteroids - taxonomy - composition - sdss - orbital-mechanics - open-data - tabular-data size_categories: - 100K<n<1M configs: - config_name: default data_files: - split: train path: data/sdss_asteroid_taxonomy.parquet default: true --- # SDSS-based Asteroid Taxonomy *Part of the [Orbital Mechanics Datasets](https://huggingface.co/collections/juliensimon/orbital-mechanics-datasets-69c24caca4ab3934c9856994) collection on Hugging Face.* Compositional taxonomy for **107,466** SDSS photometric observations of **63,468** asteroids, classified using the scheme of Carvano et al. (2010). Each observation includes SDSS u'g'r'i'z' log-reflectances, a taxonomic class assignment, and a probability score. Orbital elements from the asteroid catalog are merged in for asteroids with known orbits. ## Dataset description The Sloan Digital Sky Survey (SDSS) Moving Object Catalog observed over 100,000 asteroids in five photometric bands (u', g', r', i', z') between 1998 and 2007. Carvano et al. (2010) developed a probabilistic taxonomic classification scheme based on SDSS colors, assigning each observation to one of nine primary compositional classes inspired by the Bus taxonomy: - **V** — Basaltic (V-type) - **O** — Olivine-rich (O-type) - **Q** — Ordinary chondrite-like (Q-type) - **S** — Silicaceous (S-complex) - **A** — Strongly reddened (A-type) - **L** — Moderately reddened (L-type) - **D** — Very red, organic-rich (D-type) - **X** — Degenerate featureless (X-complex) - **C** — Carbon-rich, featureless (C-complex) When an observation falls near a class boundary, a two-letter compound class is assigned (e.g., SQ, CX, LS) indicating ambiguity between the two types. Each observation receives a probability score (0--100) for the assigned class. The `classification` column gives the best overall class per asteroid (from the asteroid summary table), chosen as either the most frequent or highest-scoring class across all SDSS observations of that object. ## Schema | Column | Type | Description | |--------|------|-------------| | `object_id` | string | Primary identifier (asteroid number or provisional designation) | | `ast_number` | Int64 | IAU asteroid catalog number (null for unnumbered) | | `ast_name` | string | IAU asteroid name (null if unnamed) | | `prov_desig` | string | Provisional designation at discovery | | `tax_class` | string | Taxonomic class for this observation (C/S/V/Q/D/L/X/A/O or compound e.g. SQ/CX) | | `score` | Int64 | Probability score for assigned class (0-100) | | `moid` | string | Unique SDSS moving-object observation ID | | `bad_flag` | Int64 | 1 if any magnitude uncertainty exceeds 3rd quartile | | `log_refl_u` | float64 | Log reflectance, SDSS u' band | | `log_refl_err_u` | float64 | Uncertainty of u' log reflectance | | `log_refl_g` | float64 | Log reflectance, SDSS g' band (reference = 1.0) | | `log_refl_err_g` | float64 | Uncertainty of g' log reflectance | | `log_refl_r` | float64 | Log reflectance, SDSS r' band | | `log_refl_err_r` | float64 | Uncertainty of r' log reflectance | | `log_refl_i` | float64 | Log reflectance, SDSS i' band | | `log_refl_err_i` | float64 | Uncertainty of i' log reflectance | | `log_refl_z` | float64 | Log reflectance, SDSS z' band | | `log_refl_err_z` | float64 | Uncertainty of z' log reflectance | | `classification` | string | Best overall class for this asteroid (most frequent or highest score) | | `score_best` | Int64 | Probability score for best classification | | `n_class` | Int64 | Number of classified SDSS observations for this asteroid | | `method` | Int64 | 1 = most frequent class chosen, 0 = highest score chosen | | `sequence` | string | Sequence of per-observation class assignments | | `abs_mag_h` | float64 | Absolute magnitude H from SDSS MOC | | `proper_semimajor_au` | float64 | Proper semi-major axis (AU), null if unavailable | | `proper_eccentricity` | float64 | Proper eccentricity, null if unavailable | | `sin_proper_inclination` | float64 | Sine of proper inclination, null if unavailable | | `osc_semimajor_au` | float64 | Osculating semi-major axis (AU) | | `osc_eccentricity` | float64 | Osculating eccentricity | | `osc_inclination_deg` | float64 | Osculating inclination (degrees) | ## Quick stats - **107,466** observations of **63,468** unique asteroids - **33** taxonomic classes - Top classes: **C** (28,784), **S** (25,713), **LS** (8,956), **L** (8,129), **X** (6,003) - **78,180** observations with orbital elements ## Usage ```python from datasets import load_dataset ds = load_dataset("juliensimon/sdss-asteroid-taxonomy", split="train") df = ds.to_pandas() # Class distribution df["tax_class"].value_counts().plot.bar() # Taxonomic composition vs semi-major axis (main belt structure) import matplotlib.pyplot as plt belt = df[df["osc_semimajor_au"].between(2.0, 3.5)] for cls in ["S", "C", "X", "V"]: subset = belt[belt["tax_class"] == cls] plt.hist(subset["osc_semimajor_au"], bins=100, alpha=0.5, label=cls, density=True) plt.xlabel("Semi-major axis (AU)") plt.legend() # High-confidence V-type (basaltic) asteroids vesta_family = df[(df["tax_class"] == "V") & (df["score"] > 80)] # SDSS color-color diagram plt.scatter(df["log_refl_r"] - df["log_refl_i"], df["log_refl_g"] - df["log_refl_r"], c=df["tax_class"].astype("category").cat.codes, s=0.2, alpha=0.3) ``` ## Data source [PDS Small Bodies Node — SDSS-based Asteroid Taxonomy V1.1](https://sbn.psi.edu/pds/resource/sdsstax.html) Based on SDSS Moving Object Catalog observations (1998-2007). See Carvano et al. (2010) and Ivezic et al. (2010). ## Pipeline Source code: [juliensimon/space-datasets](https://github.com/juliensimon/space-datasets) ## Support If you find this dataset useful, please give it a ❤️ on the [dataset page](https://huggingface.co/datasets/juliensimon/sdss-asteroid-taxonomy) and share feedback in the Community tab! Also consider giving a ⭐️ to the [space-datasets](https://github.com/juliensimon/space-datasets) repo. ## Citation ```bibtex @dataset{sdss_asteroid_taxonomy, author = {Simon, Julien}, title = {SDSS-based Asteroid Taxonomy}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/juliensimon/sdss-asteroid-taxonomy}, note = {Based on Carvano et al. (2010) SDSS taxonomy from the PDS Small Bodies Node} } ``` ## License [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/)

license: CC-BY-4.0 pretty_name: "基于斯隆数字巡天的小行星分类法(SDSS-based Asteroid Taxonomy)" language: - en description: "针对63468颗小行星的107466次斯隆数字巡天(Sloan Digital Sky Survey, SDSS)测光观测数据构建的成分分类数据集,包含u'g'r'i'z'波段对数反射率与轨道根数。" task_categories: - tabular-classification - tabular-regression tags: - space - asteroids - taxonomy - composition - sdss - orbital-mechanics - open-data - tabular-data size_categories: - 100K<n<1M configs: - config_name: default data_files: - split: train path: data/sdss_asteroid_taxonomy.parquet default: true --- # 基于SDSS的小行星分类法 *本数据集隶属于Hugging Face平台的[轨道力学数据集合集](https://huggingface.co/collections/juliensimon/orbital-mechanics-datasets-69c24caca4ab3934c9856994)。* 针对63468颗小行星的107466次斯隆数字巡天测光观测构建的成分分类体系,分类方法采用Carvano等人(2010)提出的方案。每条观测数据包含SDSS u'、g'、r'、i'、z'波段的对数反射率、所属分类标签以及概率得分。对于已知轨道的小行星,将小行星星表中的轨道根数与本数据集进行合并。 ## 数据集详情 斯隆数字巡天(SDSS)移动天体星表在1998年至2007年间,对五组测光波段(u'、g'、r'、i'、z')下的超过10万颗小行星进行了观测。Carvano等人(2010)基于SDSS测光颜色构建了概率性分类体系,将每条观测数据归类为巴斯(Bus)分类法启发的9种主要成分类型之一: - **V** — 玄武岩型(V型) - **O** — 富橄榄石型(O型) - **Q** — 普通球粒陨石型(Q型) - **S** — 硅质复合群(S复合群) - **A** — 强红化型(A型) - **L** — 中等红化型(L型) - **D** — 强红化且富有机质型(D型) - **X** — 简并无特征型(X复合群) - **C** — 富碳无特征型(C复合群) 当观测数据落在分类边界附近时,将分配双字母复合分类标签(例如SQ、CX、LS)以表示两种类型间的分类歧义。每条观测数据会获得对应分类的概率得分(0~100)。 `classification`列给出单颗小行星的最优全局分类(取自小行星汇总表),选择依据为该小行星所有SDSS观测中出现频次最高或得分最高的分类。 ## 数据模式 | 列名 | 数据类型 | 描述 | |------|--------|-------------| | `object_id` | 字符串 | 主标识符(小行星正式编号或临时命名) | | `ast_number` | Int64 | 国际天文学联合会(International Astronomical Union, IAU)小行星星表编号(未编号小行星对应空值) | | `ast_name` | 字符串 | IAU小行星正式名称(未命名小行星对应空值) | | `prov_desig` | 字符串 | 发现时的临时命名 | | `tax_class` | 字符串 | 本次观测的分类标签(C/S/V/Q/D/L/X/A/O或复合标签如SQ/CX) | | `score` | Int64 | 分配分类的概率得分(0~100) | | `moid` | 字符串 | 唯一SDSS移动天体观测ID | | `bad_flag` | Int64 | 若任意波段星等不确定度超过上四分位数则取值为1,否则为0 | | `log_refl_u` | float64 | SDSS u'波段对数反射率 | | `log_refl_err_u` | float64 | u'波段对数反射率的不确定度 | | `log_refl_g` | float64 | SDSS g'波段对数反射率(参考值=1.0) | | `log_refl_err_g` | float64 | g'波段对数反射率的不确定度 | | `log_refl_r` | float64 | SDSS r'波段对数反射率 | | `log_refl_err_r` | float64 | r'波段对数反射率的不确定度 | | `log_refl_i` | float64 | SDSS i'波段对数反射率 | | `log_refl_err_i` | float64 | i'波段对数反射率的不确定度 | | `log_refl_z` | float64 | SDSS z'波段对数反射率 | | `log_refl_err_z` | float64 | z'波段对数反射率的不确定度 | | `classification` | 字符串 | 该小行星的最优全局分类(频次最高或得分最高) | | `score_best` | Int64 | 最优分类的概率得分 | | `n_class` | Int64 | 该小行星的有效SDSS分类观测次数 | | `method` | Int64 | 1表示选择频次最高的分类,0表示选择得分最高的分类 | | `sequence` | 字符串 | 单条观测分类标签的序列 | | `abs_mag_h` | float64 | 取自SDSS移动天体星表的绝对星等H | | `proper_semimajor_au` | float64 | 固有半长轴(天文单位AU),无对应数据时为空值 | | `proper_eccentricity` | float64 | 固有偏心率,无对应数据时为空值 | | `sin_proper_inclination` | float64 | 固有轨道倾角的正弦值,无对应数据时为空值 | | `osc_semimajor_au` | float64 | 密切半长轴(天文单位AU) | | `osc_eccentricity` | float64 | 密切偏心率 | | `osc_inclination_deg` | float64 | 密切轨道倾角(单位:度) | ## 快速统计 - **107466**次观测,覆盖**63468**颗独立小行星 - 共计**33**种分类标签 - 占比最高的五类分类:**C**(28784条)、**S**(25713条)、**LS**(8956条)、**L**(8129条)、**X**(6003条) - **78180**条观测附带轨道根数 ## 使用示例 python from datasets import load_dataset ds = load_dataset("juliensimon/sdss-asteroid-taxonomy", split="train") df = ds.to_pandas() # 分类标签分布 df["tax_class"].value_counts().plot.bar() # 分类组成与半长轴的关系(主小行星带结构) import matplotlib.pyplot as plt belt = df[df["osc_semimajor_au"].between(2.0, 3.5)] for cls in ["S", "C", "X", "V"]: subset = belt[belt["tax_class"] == cls] plt.hist(subset["osc_semimajor_au"], bins=100, alpha=0.5, label=cls, density=True) plt.xlabel("半长轴(AU)") plt.legend() # 高置信度V型(玄武岩型)小行星 vesta_family = df[(df["tax_class"] == "V") & (df["score"] > 80)] # SDSS测光颜色-颜色图 plt.scatter(df["log_refl_r"] - df["log_refl_i"], df["log_refl_g"] - df["log_refl_r"], c=df["tax_class"].astype("category").cat.codes, s=0.2, alpha=0.3) ## 数据来源 [美国国家航空航天局行星数据系统(Planetary Data System, PDS)小天体节点——基于SDSS的小行星分类法V1.1](https://sbn.psi.edu/pds/resource/sdsstax.html) 本数据集基于SDSS移动天体星表的观测数据(1998-2007年),相关参考文献参见Carvano等人(2010)与Ivezic等人(2010)。 ## 数据处理流程 源代码仓库:[juliensimon/space-datasets](https://github.com/juliensimon/space-datasets) ## 支持与反馈 若您认为本数据集对您的研究有所帮助,请前往[数据集主页](https://huggingface.co/datasets/juliensimon/sdss-asteroid-taxonomy)点击点赞❤️,并在社区标签页分享您的使用反馈!同时欢迎为[space-datasets](https://github.com/juliensimon/space-datasets)代码仓库点亮⭐️。 ## 引用格式 bibtex @dataset{sdss_asteroid_taxonomy, author = {Simon, Julien}, title = {SDSS-based Asteroid Taxonomy}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/juliensimon/sdss-asteroid-taxonomy}, note = {Based on Carvano et al. (2010) SDSS taxonomy from the PDS Small Bodies Node} } ## 许可证 [知识共享署名4.0国际许可协议(CC-BY-4.0)](https://creativecommons.org/licenses/by/4.0/)
提供机构:
juliensimon
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作