ziksy/faa-aviation-training
收藏Hugging Face2026-03-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ziksy/faa-aviation-training
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: apache-2.0
task_categories:
- text-generation
- question-answering
task_ids:
- language-modeling
- open-domain-qa
tags:
- aviation
- faa
- regulations
- pilot-training
- flight-training
- aeronautics
- cfr-title-14
- aim
- phak
- private-pilot
- instrument-rating
- commercial-pilot
pretty_name: FAA Aviation Training Dataset
size_categories:
- 1K<n<10K
source_datasets: []
dataset_info:
features:
- name: text
dtype: string
- name: category
dtype: string
- name: source
dtype: string
- name: format
dtype: string
- name: difficulty
dtype: string
splits:
- name: train
num_examples: 8945
- name: test
num_examples: 874
---
# FAA Aviation Training Dataset
A comprehensive aviation training dataset built from official FAA publications for fine-tuning language models on pilot knowledge, federal aviation regulations, and flight procedures.
## Overview
- **9,819 total samples** (~4.1M tokens)
- **Three data formats**: instruction-tuning Q&A, knowledge text, practice exam MCQs
- **Three authoritative sources**: Title 14 CFR (FARs), Aeronautical Information Manual (AIM), Pilot's Handbook of Aeronautical Knowledge (PHAK)
- **Five difficulty levels**: student, private, instrument, commercial, ATP
- **214 practice exam questions** across PPL, IFR, and CPL certifications
- **ChatML format** (Qwen/im_start) for instruction and exam samples
## Dataset Structure
### Splits
| Split | Samples | Description |
|-------|---------|-------------|
| train | 8,945 | Instruction Q&A + knowledge text for fine-tuning |
| test | 874 | Held-out Q&A + all 214 exam MCQs for evaluation |
### Columns
| Column | Type | Description |
|--------|------|-------------|
| `text` | string | Formatted training text (ChatML for Q&A/exam, plain for knowledge) |
| `category` | string | Content category (regulations, procedures, knowledge, weather, etc.) |
| `source` | string | Source reference (e.g., "14 CFR 61.3", "AIM 1-1-1", "PHAK Chapter 4") |
| `format` | string | `instruction`, `knowledge`, or `exam` |
| `difficulty` | string | Pilot certificate level: student/private/instrument/commercial/atp |
### Content by Format
| Format | Count | Description |
|--------|-------|-------------|
| instruction | 6,582 | Q&A pairs from FARs, AIM, PHAK, hand-crafted aviation knowledge |
| knowledge | 3,023 | Regulation/AIM/PHAK text chunks for continued pretraining |
| exam | 214 | FAA-style multiple-choice practice exam questions |
### Content by Category
| Category | Count | Sources |
|----------|-------|---------|
| regulations | 6,883 | 24 parts of Title 14 CFR (1,570 FAR sections) |
| procedures | 1,637 | AIM chapters 1-10 (396 sections) |
| knowledge | 1,055 | PHAK chapters 1-17 (524 pages) |
| weather | 98 | METAR/TAF decoding, weather minimums, weather theory |
| navigation | 41 | VOR, GPS, charts, instrument approaches |
| flight_operations | 30 | Weight & balance, performance, preflight |
| aerodynamics | 15 | Forces of flight, stalls, load factor |
| airspace | 15 | Classes A-G, special use airspace |
| aircraft_systems | 12 | Instruments, engines, electrical |
| human_factors | 10 | Hypoxia, spatial disorientation, ADM |
| emergency_procedures | 8 | Engine failure, fire, lost comms |
| scenarios | 15 | Situational decision-making |
### Practice Exams (in test split)
| Exam | Questions | Topics |
|------|-----------|--------|
| PPL (Private Pilot) | 129 | Regulations, airspace, weather, aerodynamics, navigation |
| IFR (Instrument Rating) | 50 | IFR procedures, approaches, weather, regulations |
| CPL (Commercial Pilot) | 35 | Commercial operations, maneuvers, regulations |
## Chat Template
Instruction and exam samples use ChatML (Qwen `im_start`/`im_end`) format:
```
<|im_start|>system
You are an aviation expert and FAA-certified flight instructor. Answer questions accurately based on Federal Aviation Regulations, the Aeronautical Information Manual, and standard aviation knowledge.<|im_end|>
<|im_start|>user
What are the VFR weather minimums in Class B airspace?<|im_end|>
<|im_start|>assistant
In Class B airspace, VFR weather minimums are: visibility 3 statute miles and clear of clouds...<|im_end|>
```
## Sources
All data is derived from **publicly available U.S. government publications**:
- **Federal Aviation Regulations** (Title 14 CFR) -- 24 parts fetched via the eCFR API, covering Parts 1, 43, 61, 65, 67, 68, 71, 73, 77, 91, 93, 95, 97, 99, 103, 105, 107, 117, 119, 121, 135, 141, 142, 145
- **Aeronautical Information Manual** (AIM) -- extracted from the official FAA PDF, 10 chapters covering navigation, airspace, ATC, procedures, emergencies, weather, and more
- **Pilot's Handbook of Aeronautical Knowledge** (PHAK, FAA-H-8083-25B) -- extracted from the official FAA PDF, 17 chapters covering aerodynamics, weather, navigation, aeromedical factors, and more
- **Hand-crafted Q&A** -- 78 expert-verified aviation knowledge pairs (METAR decoding, weather minimums, squawk codes, emergency procedures, etc.)
- **Scenario-based Q&A** -- 15 situational decision-making questions
## Usage
```python
from datasets import load_dataset
# Load the full dataset
ds = load_dataset("ziksy/faa-aviation-training")
# Training split
train = ds["train"]
# Eval split (includes practice exam MCQs)
test = ds["test"]
# Filter by format
qa_only = train.filter(lambda x: x["format"] == "instruction")
knowledge_only = train.filter(lambda x: x["format"] == "knowledge")
# Filter by difficulty
private_pilot = train.filter(lambda x: x["difficulty"] in ("student", "private"))
```
## Intended Use
- Fine-tuning language models for aviation domain knowledge
- Building aviation-specific AI assistants for flight training
- Evaluating model performance on FAA written exam content
- Research on domain-specific language model adaptation
## Limitations
- AIM Chapter 11 (Other Information) is not included in this version
- Exam questions cover PPL/IFR/CPL only (no ATP)
- PHAK content is from FAA-H-8083-25B (some chapters may have newer editions)
- Q&A pairs generated from regulation text may occasionally include formatting artifacts
## License
Apache 2.0. Source material is U.S. government work and is in the public domain.
提供机构:
ziksy



