five

ziksy/faa-aviation-training

收藏
Hugging Face2026-03-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ziksy/faa-aviation-training
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: apache-2.0 task_categories: - text-generation - question-answering task_ids: - language-modeling - open-domain-qa tags: - aviation - faa - regulations - pilot-training - flight-training - aeronautics - cfr-title-14 - aim - phak - private-pilot - instrument-rating - commercial-pilot pretty_name: FAA Aviation Training Dataset size_categories: - 1K<n<10K source_datasets: [] dataset_info: features: - name: text dtype: string - name: category dtype: string - name: source dtype: string - name: format dtype: string - name: difficulty dtype: string splits: - name: train num_examples: 8945 - name: test num_examples: 874 --- # FAA Aviation Training Dataset A comprehensive aviation training dataset built from official FAA publications for fine-tuning language models on pilot knowledge, federal aviation regulations, and flight procedures. ## Overview - **9,819 total samples** (~4.1M tokens) - **Three data formats**: instruction-tuning Q&A, knowledge text, practice exam MCQs - **Three authoritative sources**: Title 14 CFR (FARs), Aeronautical Information Manual (AIM), Pilot's Handbook of Aeronautical Knowledge (PHAK) - **Five difficulty levels**: student, private, instrument, commercial, ATP - **214 practice exam questions** across PPL, IFR, and CPL certifications - **ChatML format** (Qwen/im_start) for instruction and exam samples ## Dataset Structure ### Splits | Split | Samples | Description | |-------|---------|-------------| | train | 8,945 | Instruction Q&A + knowledge text for fine-tuning | | test | 874 | Held-out Q&A + all 214 exam MCQs for evaluation | ### Columns | Column | Type | Description | |--------|------|-------------| | `text` | string | Formatted training text (ChatML for Q&A/exam, plain for knowledge) | | `category` | string | Content category (regulations, procedures, knowledge, weather, etc.) | | `source` | string | Source reference (e.g., "14 CFR 61.3", "AIM 1-1-1", "PHAK Chapter 4") | | `format` | string | `instruction`, `knowledge`, or `exam` | | `difficulty` | string | Pilot certificate level: student/private/instrument/commercial/atp | ### Content by Format | Format | Count | Description | |--------|-------|-------------| | instruction | 6,582 | Q&A pairs from FARs, AIM, PHAK, hand-crafted aviation knowledge | | knowledge | 3,023 | Regulation/AIM/PHAK text chunks for continued pretraining | | exam | 214 | FAA-style multiple-choice practice exam questions | ### Content by Category | Category | Count | Sources | |----------|-------|---------| | regulations | 6,883 | 24 parts of Title 14 CFR (1,570 FAR sections) | | procedures | 1,637 | AIM chapters 1-10 (396 sections) | | knowledge | 1,055 | PHAK chapters 1-17 (524 pages) | | weather | 98 | METAR/TAF decoding, weather minimums, weather theory | | navigation | 41 | VOR, GPS, charts, instrument approaches | | flight_operations | 30 | Weight & balance, performance, preflight | | aerodynamics | 15 | Forces of flight, stalls, load factor | | airspace | 15 | Classes A-G, special use airspace | | aircraft_systems | 12 | Instruments, engines, electrical | | human_factors | 10 | Hypoxia, spatial disorientation, ADM | | emergency_procedures | 8 | Engine failure, fire, lost comms | | scenarios | 15 | Situational decision-making | ### Practice Exams (in test split) | Exam | Questions | Topics | |------|-----------|--------| | PPL (Private Pilot) | 129 | Regulations, airspace, weather, aerodynamics, navigation | | IFR (Instrument Rating) | 50 | IFR procedures, approaches, weather, regulations | | CPL (Commercial Pilot) | 35 | Commercial operations, maneuvers, regulations | ## Chat Template Instruction and exam samples use ChatML (Qwen `im_start`/`im_end`) format: ``` <|im_start|>system You are an aviation expert and FAA-certified flight instructor. Answer questions accurately based on Federal Aviation Regulations, the Aeronautical Information Manual, and standard aviation knowledge.<|im_end|> <|im_start|>user What are the VFR weather minimums in Class B airspace?<|im_end|> <|im_start|>assistant In Class B airspace, VFR weather minimums are: visibility 3 statute miles and clear of clouds...<|im_end|> ``` ## Sources All data is derived from **publicly available U.S. government publications**: - **Federal Aviation Regulations** (Title 14 CFR) -- 24 parts fetched via the eCFR API, covering Parts 1, 43, 61, 65, 67, 68, 71, 73, 77, 91, 93, 95, 97, 99, 103, 105, 107, 117, 119, 121, 135, 141, 142, 145 - **Aeronautical Information Manual** (AIM) -- extracted from the official FAA PDF, 10 chapters covering navigation, airspace, ATC, procedures, emergencies, weather, and more - **Pilot's Handbook of Aeronautical Knowledge** (PHAK, FAA-H-8083-25B) -- extracted from the official FAA PDF, 17 chapters covering aerodynamics, weather, navigation, aeromedical factors, and more - **Hand-crafted Q&A** -- 78 expert-verified aviation knowledge pairs (METAR decoding, weather minimums, squawk codes, emergency procedures, etc.) - **Scenario-based Q&A** -- 15 situational decision-making questions ## Usage ```python from datasets import load_dataset # Load the full dataset ds = load_dataset("ziksy/faa-aviation-training") # Training split train = ds["train"] # Eval split (includes practice exam MCQs) test = ds["test"] # Filter by format qa_only = train.filter(lambda x: x["format"] == "instruction") knowledge_only = train.filter(lambda x: x["format"] == "knowledge") # Filter by difficulty private_pilot = train.filter(lambda x: x["difficulty"] in ("student", "private")) ``` ## Intended Use - Fine-tuning language models for aviation domain knowledge - Building aviation-specific AI assistants for flight training - Evaluating model performance on FAA written exam content - Research on domain-specific language model adaptation ## Limitations - AIM Chapter 11 (Other Information) is not included in this version - Exam questions cover PPL/IFR/CPL only (no ATP) - PHAK content is from FAA-H-8083-25B (some chapters may have newer editions) - Q&A pairs generated from regulation text may occasionally include formatting artifacts ## License Apache 2.0. Source material is U.S. government work and is in the public domain.
提供机构:
ziksy
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作