haydn-jones/SynthMultiTurn
收藏Hugging Face2026-04-01 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/haydn-jones/SynthMultiTurn
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: train
path: train.json
- split: validation
path: validation.json
- split: validation_adversarial
path: validation_adversarial.json
license: mit
language:
- en
task_categories:
- text-generation
pretty_name: SynthMultiTurn
size_categories:
- 10K<n<100K
---
# SynthMultiTurn
Synthetic multi-turn state-tracking conversations for supervised fine-tuning and
assistant-span masking tests.
## Summary
Each example is a chat conversation with:
- 4 to 7 assistant turns
- optional `system` messages
- per-turn `reasoning_content` and `content` on assistant messages
- deterministic state updates over four registers and a list
- adversarial quoted control-sequence text in some user turns
## Splits
- `train`: 12,000 examples
- `validation`: 500 clean examples
- `validation_adversarial`: 500 adversarial examples with quoted control sequences in every example
配置项:
- 配置名称:default
数据文件:
- 拆分集:训练集(train),路径:train.json
- 拆分集:验证集(validation),路径:validation.json
- 拆分集:对抗验证集(validation_adversarial),路径:validation_adversarial.json
许可证:MIT许可证(MIT)
语言:英语(en)
任务类别:文本生成(text-generation)
数据集展示名:SynthMultiTurn
规模类别:1万<样本数<10万
# SynthMultiTurn 数据集
用于监督微调(supervised fine-tuning)与助手跨度掩码测试(assistant-span masking tests)的合成多轮状态追踪对话数据集。
## 摘要
每个样本均为一段聊天对话,包含以下要素:
- 4至7轮助手回复
- 可选的系统(system)消息
- 助手回复的每一轮均包含`reasoning_content`(推理内容)与`content`(对话内容)两个字段
- 基于四个寄存器与一个列表执行确定性状态更新
- 部分用户轮次中嵌入了带引用标记的对抗性控制序列文本
## 拆分集说明
- 训练集(train):共12000条样本
- 验证集(validation):共500条干净无对抗样本
- 对抗验证集(validation_adversarial):共500条对抗样本,每条样本均包含带引用标记的控制序列文本
提供机构:
haydn-jones



