esa-sceva/satcom-synth-qa
收藏Hugging Face2025-11-25 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/esa-sceva/satcom-synth-qa
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_name: satcom-synth-qa
tags:
- satellite-communications
- synthetic-data
- question-answering
- esa-sceva
language: en
license: apache-2.0
task_categories:
- question-answering
size_categories:
- 100K<n<1M
---
# esa-sceva/satcom-synth-qa
## Summary
Synthetic dataset of question-answer pairs on satellite communications, created to support model fine-tuning and evaluation in the SatCom domain.
## Description
Generated from SatCom documents using large models (LLaMA 70B 3.3 Instruct and Qwen 2 72B Instruct).
Two single-hop generation strategies were applied:
1. Joint QA generation from full documents.
2. Two-step process with separate question and answer generation for improved quality.
The dataset covers a wide range of SatCom topics, providing diverse factual and conceptual questions.
## Composition
- Around 1.1 million QAs before filtering
- JSONL format
- Fields: `question`, `answer`
- Language: English
## Intended use
Training or evaluating models for factual question answering and domain adaptation in satellite communications.
## Quality control
- Automatic structure validation
- Filtering for coherence and domain relevance
- Spot-checking by SatCom experts
## Example
```python
from datasets import load_dataset
ds = load_dataset("esa-sceva/satcom-synth-qa", split="train")
print(ds[0]["question"])
print(ds[0]["answer"])
数据集名称:satcom-synth-qa
标签:卫星通信(satellite-communications)、合成数据(synthetic-data)、问答(question-answering)、esa-sceva
语言:英语
许可协议:Apache 2.0
任务类别:问答任务
样本规模:10万<样本量<100万
# esa-sceva/satcom-synth-qa
## 数据集概述
本数据集为卫星通信(satellite communications)领域的问答对合成数据集,旨在支撑卫星通信(SatCom)领域的模型微调与评估工作。
## 数据集详情
本数据集基于大语言模型(Large Language Model,LLM)——LLaMA 70B 3.3 Instruct与Qwen 2 72B Instruct,从卫星通信文档中生成。采用两种单跳生成策略:
1. 基于完整文档联合生成问答对;
2. 分两步分别生成问题与答案,以提升生成质量。
本数据集覆盖卫星通信领域的广泛主题,提供多样化的事实性与概念性问答内容。
## 数据组成
- 过滤前共计约110万条问答对;
- 采用JSONL格式存储;
- 数据字段包含`question`(问题)与`answer`(答案);
- 语言为英语。
## 适用场景
可用于训练或评估卫星通信领域的事实性问答模型,以及开展领域自适应相关研究。
## 质量管控
- 自动化结构校验;
- 针对内容连贯性与领域相关性进行筛选;
- 由卫星通信领域专家进行抽样复核。
## 使用示例
python
from datasets import load_dataset
ds = load_dataset("esa-sceva/satcom-synth-qa", split="train")
print(ds[0]["question"])
print(ds[0]["answer"])
提供机构:
esa-sceva



