five

samfatnassi/gaia-dr3

收藏
Hugging Face2026-02-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/samfatnassi/gaia-dr3
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 size_categories: - 1B<n task_categories: - tabular-classification - feature-extraction tags: - astronomy - astrophysics - gaia-dr3 - stellar-data - space --- # Gaia-DR3: A Billion-Star Dataset for Galactic Analysis This dataset is a high-fidelity, pre-processed collection of over **1 Billion stellar records** derived from the **European Space Agency (ESA) Gaia Mission (Data Release 3)**. It is specifically curated for large-scale galactic archaeology, 3D mapping, and training advanced machine learning models like **SADIM-V2 77M**. ### 1. Dataset Overview The dataset provides a comprehensive snapshot of the Milky Way, covering astrometric, kinematic, and photometric parameters. It has been optimized for high-performance computing and AI-driven astronomical frameworks. ### 2. Feature Schema (13 Core Parameters) The dataset is structured with 13 essential features for understanding stellar dynamics: | Feature Name | Data Type | Scientific Description | | :--- | :--- | :--- | | **source_id** | `int64` | Unique Gaia DR3 identifier for each stellar source. | | **ra / dec** | `float64` | **Equatorial Coordinates:** Right Ascension & Declination. | | **l / b** | `float64` | **Galactic Coordinates:** Longitude & Latitude relative to the Galaxy. | | **pmra / pmdec** | `float64` | **Proper Motion:** Angular velocity of the star across the sky (mas/yr). | | **d_pc** | `float64` | **Distance:** Calculated distance from Earth in Parsecs ($1/parallax$). | | **x, y, z** | `float64` | **3D Cartesian:** Heliocentric position relative to the Sun. | | **abs_m** | `float64` | **Absolute Magnitude:** The intrinsic brightness of the star. | | **bp_rp** | `float32` | **Color Index:** Difference between BP and RP (Temperature indicator). | ### 3. Usage & Access (Streaming Mode) **Note:** Due to the massive scale of this dataset (1B+ rows), downloading the full files to a local machine is not recommended. Use the **Streaming Mode** provided by the Hugging Face `datasets` library to process data on the fly: ```python from datasets import load_dataset # Stream the dataset directly without downloading the full files dataset = load_dataset("samfatnassi/gaia-dr3", split="train", streaming=True) # Access a single stellar record star_record = next(iter(dataset)) print(star_record) 4. Integration with SADIM-V2 This dataset serves as the foundational "knowledge base" for the SADIM-54M Model. While the dataset provides the raw observational facts, the model provides the analytical intelligence to predict and classify these stars. 5. Research & Ethics (Open Science) This dataset is released under the Apache 2.0 License. It is provided as a contribution to Open Science and Humanity, encouraging researchers, students, and developers worldwide to explore the mysteries of our galaxy without boundaries. Data Source: European Space Agency (ESA) Gaia Mission Project Lead: KilmaAI / Sadim

许可证:Apache-2.0 规模类别: - 记录数大于10亿(1B<n) 任务类别: - 表格分类 - 特征提取 标签: - 天文学 - 天体物理学 - 盖亚第三数据发布(Gaia-DR3) - 恒星数据 - 太空 # 盖亚第三数据发布(Gaia-DR3):用于银河天体分析的十亿恒星数据集 本数据集是从欧洲空间局(European Space Agency, ESA)盖亚任务第三数据发布(Gaia Mission Data Release 3)中提取的高保真、预处理完成的超10亿条恒星记录集合,专为大规模银河考古、三维星系测绘以及训练SADIM-V2 77M等先进机器学习模型而精心甄选构建。 ### 1. 数据集概览 本数据集全面呈现银河系的观测快照,涵盖天体测量学、运动学与光度测量学相关参数,针对高性能计算与人工智能驱动的天文分析框架进行了优化适配。 ### 2. 特征架构(13项核心参数) 本数据集包含13项用于解析恒星动力学的核心特征,具体如下: | 特征名称 | 数据类型 | 科学描述 | | :--- | :--- | :--- | | **源标识符(source_id)** | `int64` | 每个恒星源的唯一盖亚DR3标识符。 | | **赤经/赤纬(ra / dec)** | `float64` | **赤道坐标**:赤经与赤纬。 | | **银经/银纬(l / b)** | `float64` | **银河坐标**:相对于银河系的银经与银纬。 | | **自行(pmra / pmdec)** | `float64` | **自行**:恒星在天球上的角速度(毫角秒/年,mas/yr)。 | | **距离(d_pc)** | `float64` | **距离**:以秒差距(Parsecs, pc)为单位的地球至恒星计算距离,计算公式为$1/视差$。 | | **三维直角坐标(x, y, z)** | `float64` | **日心位置**:以太阳为参考的三维直角坐标。 | | **绝对星等(abs_m)** | `float64` | **绝对星等**:恒星的本征亮度。 | | **色指数(bp_rp)** | `float32` | **色指数**:BP星等与RP星等之差,可作为恒星温度指示参数。 | ### 3. 使用与访问(流式模式) **注意**:由于本数据集规模庞大(超10亿行),不建议将完整数据集下载至本地设备。请使用Hugging Face `datasets`库提供的流式处理模式(Streaming Mode)实时处理数据: python from datasets import load_dataset # Stream the dataset directly without downloading the full files dataset = load_dataset("samfatnassi/gaia-dr3", split="train", streaming=True) # Access a single stellar record star_record = next(iter(dataset)) print(star_record) ### 4. 与SADIM-V2的集成 本数据集是SADIM-54M模型的基础“知识库”。数据集提供原始观测数据,而模型则提供分析智能以实现恒星的预测与分类。 ### 5. 研究与伦理(开放科学) 本数据集基于Apache 2.0许可证发布,作为开放科学与人类知识的贡献,旨在鼓励全球范围内的研究者、学生与开发者无边界地探索银河系的奥秘。 数据来源:欧洲空间局(ESA)盖亚任务 项目负责人:KilmaAI / Sadim
提供机构:
samfatnassi
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作