AstroChat - A Dataset of synthetically generated conversations for LLM supervised fine-tuning in the domain of Space Mission Engineering and Astronautics
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11531578
下载链接
链接失效反馈官方服务:
资源简介:
AstroChat Dataset Description
Purpose and Scope
The AstroChat dataset is a collection of 901 dialogues, synthetically generated, tailored to the specific domain of Astronautics / Space Mission Engineering. This dataset will be frequently updated following feedback from the community. If you would like to contribute, please reach out in the community discussion.
Intended Use
The dataset is intended to be used for supervised fine-tuning of chat LLMs (Large Language Models). Due to its currently limited size, you should use a pre-trained instruct model and ideally augment the AstroChat dataset with other datasets in the area of (Science Technology, Engineering and Math).
DATASET DESCRIPTION
Access
Manual download from Hugging face hub: https://huggingface.co/datasets/patrickfleith/AstroChat
Or with python:
from datasets import load_dataset
dataset = load_dataset("patrickfleith/AstroChat")
Structure
901 generated conversations between a simulated user and AI-assistant (more on the generation method below). Each instance is made of the following field (column):
id: a unique identifier to refer to this specific conversation. Useeful for traceability purposes, especially for further processing task or merge with other datasets.
topic: a topic within the domain of Astronautics / Space Mission Engineering. This field is useful to filter the dataset by topic, or to create a topic-based split.
subtopic: a subtopic of the topic. For instance in the topic of Propulsion, there are subtopics like Injector Design, Combustion Instability, Electric Propulsion, Chemical Propulsion, etc.
persona: description of the persona used to simulate a user
opening_question: the first question asked by the user to start a conversation with the AI-assistant
messages: the whole conversation messages between the user and the AI assistant in already nicely formatted for rapid use with the transformers library. A list of messages where each message is a dictionary with the following fields:
role: the role of the speaker, either user or assistant
content: the message content. For the assistant, it is the answer to the user's question. For the user, it is the question asked to the assistant.
Important See the full list of topics and subtopics covered below.
Metadata
Dataset is version controlled and commits history is available here: https://huggingface.co/datasets/patrickfleith/AstroChat/commits/main
Generation Method
We used a method inspired from Ultrachat dataset. Especially, we implemented our own version of Human-Model interaction from Sector I: Questions about the World of their paper:
Ding, N., Chen, Y., Xu, B., Qin, Y., Zheng, Z., Hu, S., ... & Zhou, B. (2023). Enhancing chat language models by scaling high-quality instructional conversations. arXiv preprint arXiv:2305.14233.
Step-by-step description
Defined a set of user persona
Defined a set of topics/ disciplines within the domain of Astronautics / Space Mission Engineering
For each topics, we defined a set of subtopics to narrow down the conversation to more specific and niche conversations (see below the full list)
For each subtopic we generate a set of opening questions that the user could ask to start a conversation (see below the full list)
We then distil the knowledge of an strong Chat Model (in our case ChatGPT through then api with gpt-4-turbo model) to generate the answers to the opening questions
We simulate follow-up questions from the user to the assistant, and the assistant's answers to these questions which builds up the messages.
Future work and contributions appreciated
Distil knowledge from more models (Anthropic, Mixtral, GPT-4o, etc...)
Implement more creativity in the opening questions and follow-up questions
Filter-out questions and conversations which are too similar
Ask topic and subtopic expert to validate the generated conversations to have a sense on how reliable is the overall dataset
Languages
All instances in the dataset are in english
Size
901 synthetically-generated dialogue
USAGE AND GUIDELINES
License
AstroChat © 2024 by Patrick Fleith is licensed under Creative Commons Attribution 4.0 International
Restrictions
No restriction. Please provide the correct attribution following the license terms.
Citation
Patrick Fleith, AstroChat – A Dataset of synthetically generated conversations for LLM supervised fine-tuning in the domain of Space Mission Engineering and Astronautics, (2024).
Update Frequency
Will be updated based on feedbacks. I am also looking for contributors. Help me create more datasets for Space Engineering LLMs :)
Have a feedback or spot an error?
Use the community discussion tab directly on the huggingface AstroChat dataset page.
Contact Information
Reach me here on the community tab or on LinkedIn (Patrick Fleith) with a Note.
Number of conversation per topic category
Space Propulsion Systems 135
Human Spaceflight 50
Entry Descent and Landing (EDL) 45
Mechanisms 45
Planetary Rovers 45
Attitude Determination and Control 45
Telecommunication 41
Space Business 40
Structures 40
Materials 40
Launchers, Launches, Launch Operations 36
Power System 35
Payload S/S and Optics 35
Reliability, Availability, Maintainability, and Safety (RAMS) 35
Space Missions Operations 31
Space Environment 30
Command and Data System 30
Orbital Mechanics 30
Space Law 26
Ground Systems 25
Thermal Control 25
Space Processes 20
Planetary Science and Exploration 17
Topics and subtopics covered
topic: [ Space Law ]
subtopics:
Space Law Basics
1998 ISS agreement
Outer Sppace Treaty
Geostationary Orbit Regulations
Space Traffic Management
French Space Law
topic: [ Space Business ]
subtopics:
New Space
Satellite Insurance
Financing Space Project (in EU)
Commercial Satellite Launch Services
Space Tourism
Business Models for Space Stations
Public-private Partnerships
Economic Impact of Space Technologies
topic: [ Space Missions Operations ]
subtopics:
Flight control team
Flight Dynamics
Procedure Preparation and Validation
Mission Planning
Extravehicular Activities (EVAs)
Collision Avoidance Manoeuvres
Mission Termination and De-Orbit Strategies
topic: [ Human Spaceflight ]
subtopics:
Astronaut Selection
Astronaut Training
research experiments onboard of the ISS
Human Mission to Mars Design
Environmental Control and Life Support Systems
Moon Surface Habitats
Microgravity effects
Space Suit Design and Operation
Space Medicine
Space Food
topic: [ Space Environment ]
subtopics:
Micrometeorites
Space Radiation
Solar Cycle
Spacecraft Hardening
Space Environment Effects on Satellites
Magneto-sphere and Radiation Belt
topic: [ Space Propulsion Systems ]
subtopics:
Liquid Rocket Engines
Solid Rocket Motors
Hybrid Rocket Engines
Staging and Ignition Systems
Propellant Feed Systems
Nozzle Designs
Thermodynamics
Turbopumps and/or Combustion Chambers
Specific Impulse and Thrust-to-Weight Ratios
Chemical Monopropellant Technologies
Chemical Bipropellant Systems
Nuclear Thermal Propulsion
Fuel Handling and Storage
Nuclear Propulsion Thermal Neutron Absorbers
Nuclear Propulsion Heat Exchangers
Green Propellants
Bipropellant Injector Design
Electric Ion Thrusters
Hall Effect Thrusters
Electrothermal Thrusters
Grid and Cathode Technologies
Aerospike Engines
Variable Specific Impulse Magnetoplasma Rocket (VASIMR)
Bipropellant Mixing Ratios and Combustion
Cryogenic Propellant Handling
Oxydizer and Fuel Combinations
Long-term Impacts of Propellant Residues in the Atmosphere
Propellant Tank Pressurization
topic: [ Space Processes ]
subtopics:
Trade Studies
Margins, Coningencies, Reserves
Systems Engineering
Quality Assurance
topic: [ Ground Systems ]
subtopics:
Ground Stations
Ground Support Equipments
Control Centers
Tracking Systems
AntennasGround Systems Engineering
topic: [ Planetary Rovers ]
subtopics:
Mars Rovers
Lunar Rovers
Rover Instrumentation
Rover Power Systems
Rover Thermal Control
Rover Autonomy
Wheels Design
Legged Rovers
Hazard Avoidance
topic: [ Planetary Science and Exploration ]
subtopics:
Astrobiology
Exoplanets
AsteroidsJupiter
Saturn
Search for Extraterrestrial Life
topic: [ Structures ]
subtopics:
Structural Design and Analysis
Load Path Determination
Vibration and Acoustic Testing
Thermal Protection Systems
Composite Structures
Joining Techniques (e.g., Welding, Bolting, Bonding)
Manufacturing Tolerances and Quality Control
Deployable Structures (e.g., Antennas, Solar Arrays)
topic: [ Mechanisms ]
subtopics:
Actuators and Dampers
Gimbals and Bearings
Latch and Release Devices
Hinges and Deployment Systems
Robotic Arms and Tools
Valves and Fluid Control Systems
Thermal Expansion Joints
Drive Systems and Motors
Reliability and Lifetime Analysis
topic: [ Materials ]
subtopics:
Composite Materials
Metals and Alloys
Polymers and Plastics
Nano-materials
Radiation Shielding Materials
Thermal Insulation Materials
Corrosion and Oxidation Resistance
Material Testing and Characterization
topic: [ Entry Descent and Landing (EDL) ]
subtopics:
Aerodynamics and Aeroheating
Powered Descent
Landing Gear and Systems
Heat Shield Design and Materials
Hazard Avoidance
Surface Interaction (Airbags, Crushable Structures)
Entry, Descent, and Landing Sequencing
EDL on Mars
Parachute Systems Design
topic: [ Reliability, Availability, Maintainability, and Safety (RAMS) ]
subtopics:
System Reliability Modeling
Failure Modes, Effects, and Criticality Analysis (FMECA)
Risk Assessment and Management
Safety-Critical Systems Design
Availability Modeling and Prediction
Lifecycle Cost and Duration Analysis
Hazardous Material Handling
topic: [ Orbital Mechanics ]
subtopics:
Interplanetary Trajectories
Gravity Assist Maneuvers
Orbit Determination and Propagation
Space Situational Awareness and Debris Tracking
Mission Design and Analysis Tools
Orbit Decay and Re-entry Predictions
topic: [ Launchers, Launches, Launch Operations ]
subtopics:
Launcher Types (e.g., expendable, reusable)
Launch Vehicles
Launch Sites and Infrastructure
Countdown Procedures and Sequencing
Launch Window Determination and Trajectory Analysis
Ground and Launch Crew Training
Payload Integration and Fairing Design
Environmental and Weather Constraints
topic: [ Attitude Determination and Control ]
subtopics:
Sensors for Attitude Determination (e.g., Gyroscopes, Star Trackers)
Actuators for Attitude Control (e.g., Reaction Wheels, Thrusters)
Control Algorithms (e.g., PID, Kalman Filter)
Momentum Exchange Devices
Attitude Dynamics Modeling
On-Orbit Attitude Reconfiguration
Fault Detection and Response Strategies
Sun and Earth Sensors
Magnetic Torquers and Gravity Gradient Stabilization
topic: [ Payload S/S and Optics ]
subtopics:
Payload Design and Integration
Spectral Imaging and Multi-spectral Sensors
Infrared and Ultraviolet Optics
Calibration and Validation of Optical Systems
Image Processing and Data Analysis
Thermal Control for Sensitive Optics
Data Downlink and Communication Interfaces
topic: [ Power System ]
subtopics:
Solar Panels and Arrays
Battery Types and Management Systems (e.g., Li-ion, NiMH)
Energy Storage Technologies
Fault Protection and Isolation
Harness and Cabling
Alternative Power Sources (e.g., RTGs, Fuel Cells)
Power Budgeting and Load Analysis
topic: [ Thermal Control ]
subtopics:
Active Thermal Control Systems (e.g., Heat Pumps, Louvers)
Environmental Testing and Validation
Heating and Cooling Hardware
Thermal Protection for Entry, Descent, and Landing
Cryogenic Thermal Management
topic: [ Command and Data System ]
subtopics:
Onboard Computers and Processing Units
Software Architecture and Middleware
Command Link and Telemetry Systems
Interface and Bus Systems (e.g., MIL-STD-1553, SpaceWire)
Real-Time Operating Systems (RTOS)
Security Measures and Encryption
topic: [ Telecommunication ]
subtopics:
Antenna Systems (e.g., Parabolic, Phased Array)
Communication Transponders
Frequency Bands and Spectrum Management
Signal Modulation and Demodulation Techniques
Inter-Satellite Links and Data Relays
Error Detection and Correction
Space Communication Protocols
RF and Microwave Components
Deep Space Communications
创建时间:
2024-06-10



