samfatnassi/gaia-dr3
收藏Hugging Face2026-02-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/samfatnassi/gaia-dr3
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
size_categories:
- 1B<n
task_categories:
- tabular-classification
- feature-extraction
tags:
- astronomy
- astrophysics
- gaia-dr3
- stellar-data
- space
---
# Gaia-DR3: A Billion-Star Dataset for Galactic Analysis
This dataset is a high-fidelity, pre-processed collection of over **1 Billion stellar records** derived from the **European Space Agency (ESA) Gaia Mission (Data Release 3)**. It is specifically curated for large-scale galactic archaeology, 3D mapping, and training advanced machine learning models like **SADIM-V2 77M**.
### 1. Dataset Overview
The dataset provides a comprehensive snapshot of the Milky Way, covering astrometric, kinematic, and photometric parameters. It has been optimized for high-performance computing and AI-driven astronomical frameworks.
### 2. Feature Schema (13 Core Parameters)
The dataset is structured with 13 essential features for understanding stellar dynamics:
| Feature Name | Data Type | Scientific Description |
| :--- | :--- | :--- |
| **source_id** | `int64` | Unique Gaia DR3 identifier for each stellar source. |
| **ra / dec** | `float64` | **Equatorial Coordinates:** Right Ascension & Declination. |
| **l / b** | `float64` | **Galactic Coordinates:** Longitude & Latitude relative to the Galaxy. |
| **pmra / pmdec** | `float64` | **Proper Motion:** Angular velocity of the star across the sky (mas/yr). |
| **d_pc** | `float64` | **Distance:** Calculated distance from Earth in Parsecs ($1/parallax$). |
| **x, y, z** | `float64` | **3D Cartesian:** Heliocentric position relative to the Sun. |
| **abs_m** | `float64` | **Absolute Magnitude:** The intrinsic brightness of the star. |
| **bp_rp** | `float32` | **Color Index:** Difference between BP and RP (Temperature indicator). |
### 3. Usage & Access (Streaming Mode)
**Note:** Due to the massive scale of this dataset (1B+ rows), downloading the full files to a local machine is not recommended. Use the **Streaming Mode** provided by the Hugging Face `datasets` library to process data on the fly:
```python
from datasets import load_dataset
# Stream the dataset directly without downloading the full files
dataset = load_dataset("samfatnassi/gaia-dr3", split="train", streaming=True)
# Access a single stellar record
star_record = next(iter(dataset))
print(star_record)
4. Integration with SADIM-V2
This dataset serves as the foundational "knowledge base" for the SADIM-54M Model. While the dataset provides the raw observational facts, the model provides the analytical intelligence to predict and classify these stars.
5. Research & Ethics (Open Science)
This dataset is released under the Apache 2.0 License. It is provided as a contribution to Open Science and Humanity, encouraging researchers, students, and developers worldwide to explore the mysteries of our galaxy without boundaries.
Data Source: European Space Agency (ESA) Gaia Mission
Project Lead: KilmaAI / Sadim
许可证:Apache-2.0
规模类别:
- 记录数大于10亿(1B<n)
任务类别:
- 表格分类
- 特征提取
标签:
- 天文学
- 天体物理学
- 盖亚第三数据发布(Gaia-DR3)
- 恒星数据
- 太空
# 盖亚第三数据发布(Gaia-DR3):用于银河天体分析的十亿恒星数据集
本数据集是从欧洲空间局(European Space Agency, ESA)盖亚任务第三数据发布(Gaia Mission Data Release 3)中提取的高保真、预处理完成的超10亿条恒星记录集合,专为大规模银河考古、三维星系测绘以及训练SADIM-V2 77M等先进机器学习模型而精心甄选构建。
### 1. 数据集概览
本数据集全面呈现银河系的观测快照,涵盖天体测量学、运动学与光度测量学相关参数,针对高性能计算与人工智能驱动的天文分析框架进行了优化适配。
### 2. 特征架构(13项核心参数)
本数据集包含13项用于解析恒星动力学的核心特征,具体如下:
| 特征名称 | 数据类型 | 科学描述 |
| :--- | :--- | :--- |
| **源标识符(source_id)** | `int64` | 每个恒星源的唯一盖亚DR3标识符。 |
| **赤经/赤纬(ra / dec)** | `float64` | **赤道坐标**:赤经与赤纬。 |
| **银经/银纬(l / b)** | `float64` | **银河坐标**:相对于银河系的银经与银纬。 |
| **自行(pmra / pmdec)** | `float64` | **自行**:恒星在天球上的角速度(毫角秒/年,mas/yr)。 |
| **距离(d_pc)** | `float64` | **距离**:以秒差距(Parsecs, pc)为单位的地球至恒星计算距离,计算公式为$1/视差$。 |
| **三维直角坐标(x, y, z)** | `float64` | **日心位置**:以太阳为参考的三维直角坐标。 |
| **绝对星等(abs_m)** | `float64` | **绝对星等**:恒星的本征亮度。 |
| **色指数(bp_rp)** | `float32` | **色指数**:BP星等与RP星等之差,可作为恒星温度指示参数。 |
### 3. 使用与访问(流式模式)
**注意**:由于本数据集规模庞大(超10亿行),不建议将完整数据集下载至本地设备。请使用Hugging Face `datasets`库提供的流式处理模式(Streaming Mode)实时处理数据:
python
from datasets import load_dataset
# Stream the dataset directly without downloading the full files
dataset = load_dataset("samfatnassi/gaia-dr3", split="train", streaming=True)
# Access a single stellar record
star_record = next(iter(dataset))
print(star_record)
### 4. 与SADIM-V2的集成
本数据集是SADIM-54M模型的基础“知识库”。数据集提供原始观测数据,而模型则提供分析智能以实现恒星的预测与分类。
### 5. 研究与伦理(开放科学)
本数据集基于Apache 2.0许可证发布,作为开放科学与人类知识的贡献,旨在鼓励全球范围内的研究者、学生与开发者无边界地探索银河系的奥秘。
数据来源:欧洲空间局(ESA)盖亚任务
项目负责人:KilmaAI / Sadim
提供机构:
samfatnassi



