---
license: mit
language:
- en
tags:
- dialogue segmentation
size_categories:
- n<1K
---
# Dataset Card for SuperDialseg
## Table of Contents
- [Table of Contents](#table-of-contents)
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** [https://github.com/xyease/TADAM](https://github.com/xyease/TADAM)
- **Repository:** [https://github.com/xyease/TADAM](https://github.com/xyease/TADAM)
- **Paper:** Topic-aware multi-turn dialogue modeling
- **Leaderboard:**
- **Point of Contact:** jiangjf@is.s.u-tokyo.ac.jp
### Dataset Summary
[More Information Needed]
### Supported Tasks and Leaderboards
[More Information Needed]
### Languages: English
## Dataset Structure
### Data Instances
```
{
"dial_data": {
"dialseg711": [
{
"dial_id": "dialseg711_dial_000",
"turns": [
{
"da": "",
"role": "user",
"turn_id": 0,
"utterance": "check the weather for the 7 day forecast",
"topic_id": 0,
"segmentation_label": 0
},
...
{
"da": "",
"role": "agent",
"turn_id": 23,
"utterance": "Reminder set for your meeting at 11am on the 13th with management to discuss your company picnic. Is there anything else?",
"topic_id": 4,
"segmentation_label": 1
}
],
...
}
]
}
```
### Data Fields
#### Dialogue-Level
+ `dial_id`: ID of a dialogue;
+ `turns`: All utterances of a dialogue.
#### Utterance-Level
+ `da`: Dialogue Act annotation derived from the original DGDS dataset;
+ `role`: Role annotation derived from the original DGDS dataset;
+ `turn_id`: ID of an utterance;
+ `utterance`: Text of the utterance;
+ `topic_id`: ID (order) of the current topic;
+ `segmentation_label`: 1: it is the end of a topic; 0: others.
### Data Splits
Test only
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
MIT License
### Citation Information
@article{xu2020topic,
title={Topic-aware multi-turn dialogue modeling},
author={Xu, Yi and Zhao, Hai and Zhang, Zhuosheng},
journal={arXiv preprint arXiv:2009.12539},
year={2020}
}
### Contributions
+ Thanks to [@xyease](https://github.com/xyease) for constructing this dataset.
+ Thanks to [@Coldog2333](https://github.com/Coldog2333) for adding this dataset.
---
许可证:MIT许可证
语言:
- 英语
标签:
- 对话分割(dialogue segmentation)
样本量类别:
- 样本量少于1000
---
# SuperDialseg 数据集卡片
## 目录
- [目录](#目录)
- [数据集概述](#数据集概述)
- [数据集摘要](#数据集摘要)
- [支持任务与排行榜](#支持任务与排行榜)
- [语言](#语言)
- [数据集结构](#数据集结构)
- [数据实例](#数据实例)
- [数据字段](#数据字段)
- [数据划分](#数据划分)
- [数据集构建](#数据集构建)
- [构建依据](#构建依据)
- [源数据](#源数据)
- [标注](#标注)
- [个人与敏感信息](#个人与敏感信息)
- [数据集使用注意事项](#数据集使用注意事项)
- [数据集的社会影响](#数据集的社会影响)
- [偏倚讨论](#偏倚讨论)
- [其他已知局限性](#其他已知局限性)
- [附加信息](#附加信息)
- [数据集维护者](#数据集维护者)
- [许可证信息](#许可证信息)
- [引用信息](#引用信息)
- [致谢贡献](#致谢贡献)
## 数据集概述
- **主页:** [https://github.com/xyease/TADAM](https://github.com/xyease/TADAM)
- **代码仓库:** [https://github.com/xyease/TADAM](https://github.com/xyease/TADAM)
- **相关论文:** 《主题感知多轮对话建模》
- **排行榜:**
- **联系人:** jiangjf@is.s.u-tokyo.ac.jp
### 数据集摘要
【更多信息待补充】
### 支持任务与排行榜
【更多信息待补充】
### 语言:英语
## 数据集结构
### 数据实例
{
"dial_data": {
"dialseg711": [
{
"dial_id": "dialseg711_dial_000",
"turns": [
{
"da": "",
"role": "user",
"turn_id": 0,
"utterance": "check the weather for the 7 day forecast",
"topic_id": 0,
"segmentation_label": 0
},
...
{
"da": "",
"role": "agent",
"turn_id": 23,
"utterance": "Reminder set for your meeting at 11am on the 13th with management to discuss your company picnic. Is there anything else?",
"topic_id": 4,
"segmentation_label": 1
}
],
...
}
]
}
}
### 数据字段
#### 对话层级
+ `dial_id`: 对话的唯一标识符;
+ `turns`: 单条对话的所有话语轮次。
#### 话语层级
+ `da`: 源自原始DGDS数据集的对话行为(Dialogue Act)标注;
+ `role`: 源自原始DGDS数据集的对话角色标注;
+ `turn_id`: 单条话语的唯一标识符;
+ `utterance`: 话语文本内容;
+ `topic_id`: 当前话题的ID(即序号);
+ `segmentation_label`: 分割标签:1 表示当前为话题结束位置,0 表示其余情况。
### 数据划分
仅测试集
## 数据集构建
### 构建依据
【更多信息待补充】
### 源数据
#### 初始数据收集与标准化
【更多信息待补充】
#### 源语言生产者是谁?
【更多信息待补充】
### 标注
#### 标注流程
【更多信息待补充】
#### 标注人员是谁?
【更多信息待补充】
### 个人与敏感信息
【更多信息待补充】
## 数据集使用注意事项
### 数据集的社会影响
【更多信息待补充】
### 偏倚讨论
【更多信息待补充】
### 其他已知局限性
【更多信息待补充】
## 附加信息
### 数据集维护者
【更多信息待补充】
### 许可证信息
MIT许可证
### 引用信息
bibtex
@article{xu2020topic,
title={主题感知多轮对话建模},
author={Xu, Yi and Zhao, Hai and Zhang, Zhuosheng},
journal={arXiv预印本 arXiv:2009.12539},
year={2020}
}
### 致谢贡献
+ 感谢 [@xyease](https://github.com/xyease) 构建本数据集。
+ 感谢 [@Coldog2333](https://github.com/Coldog2333) 提交本数据集。