five

esa-sceva/satcom-synth-qa

收藏
Hugging Face2025-11-25 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/esa-sceva/satcom-synth-qa
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_name: satcom-synth-qa tags: - satellite-communications - synthetic-data - question-answering - esa-sceva language: en license: apache-2.0 task_categories: - question-answering size_categories: - 100K<n<1M --- # esa-sceva/satcom-synth-qa ## Summary Synthetic dataset of question-answer pairs on satellite communications, created to support model fine-tuning and evaluation in the SatCom domain. ## Description Generated from SatCom documents using large models (LLaMA 70B 3.3 Instruct and Qwen 2 72B Instruct). Two single-hop generation strategies were applied: 1. Joint QA generation from full documents. 2. Two-step process with separate question and answer generation for improved quality. The dataset covers a wide range of SatCom topics, providing diverse factual and conceptual questions. ## Composition - Around 1.1 million QAs before filtering - JSONL format - Fields: `question`, `answer` - Language: English ## Intended use Training or evaluating models for factual question answering and domain adaptation in satellite communications. ## Quality control - Automatic structure validation - Filtering for coherence and domain relevance - Spot-checking by SatCom experts ## Example ```python from datasets import load_dataset ds = load_dataset("esa-sceva/satcom-synth-qa", split="train") print(ds[0]["question"]) print(ds[0]["answer"])

数据集名称:satcom-synth-qa 标签:卫星通信(satellite-communications)、合成数据(synthetic-data)、问答(question-answering)、esa-sceva 语言:英语 许可协议:Apache 2.0 任务类别:问答任务 样本规模:10万<样本量<100万 # esa-sceva/satcom-synth-qa ## 数据集概述 本数据集为卫星通信(satellite communications)领域的问答对合成数据集,旨在支撑卫星通信(SatCom)领域的模型微调与评估工作。 ## 数据集详情 本数据集基于大语言模型(Large Language Model,LLM)——LLaMA 70B 3.3 Instruct与Qwen 2 72B Instruct,从卫星通信文档中生成。采用两种单跳生成策略: 1. 基于完整文档联合生成问答对; 2. 分两步分别生成问题与答案,以提升生成质量。 本数据集覆盖卫星通信领域的广泛主题,提供多样化的事实性与概念性问答内容。 ## 数据组成 - 过滤前共计约110万条问答对; - 采用JSONL格式存储; - 数据字段包含`question`(问题)与`answer`(答案); - 语言为英语。 ## 适用场景 可用于训练或评估卫星通信领域的事实性问答模型,以及开展领域自适应相关研究。 ## 质量管控 - 自动化结构校验; - 针对内容连贯性与领域相关性进行筛选; - 由卫星通信领域专家进行抽样复核。 ## 使用示例 python from datasets import load_dataset ds = load_dataset("esa-sceva/satcom-synth-qa", split="train") print(ds[0]["question"]) print(ds[0]["answer"])
提供机构:
esa-sceva
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作