alexchern5757/groundhog_reddit
收藏Hugging Face2024-01-27 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/alexchern5757/groundhog_reddit
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- conversational
- text-generation
language:
- en
size_categories:
- 10K<n<100K
---
The dataset is presented in the paper "GroundHog: Dialogue Generation using Multi-Grained Linguistic Input"
**NOTE** Some dialogues may have the same beginning. This is due to the fact that in our case, the dialogue is a replica chain, which is built according to the replica tree in the source data.
The dataset is uploaded in .jsonl format as List[Dialogue]
Dialogue:
- dialogue: List[Utterance]
- meta: Meta
- grounding: str
- reddit_id: str
Utterance:
- id: str
- speaker: str
- text: str
- discourse: Triplet[from: str, to: str, relation: str]
- sentiment: Pair[class: str, score: float]
- AMT: str
Meta:
- id: str
- title: str
- score: float
- comms_num: int
- url: str
- created: str
提供机构:
alexchern5757
原始信息汇总
数据集概述
基本信息
- 许可证:MIT
- 任务类别:
- 对话
- 文本生成
- 语言:英语
- 数据规模:10K<n<100K
数据集来源
- 该数据集在论文 "GroundHog: Dialogue Generation using Multi-Grained Linguistic Input" 中提出。
数据格式
- 数据集以
.jsonl格式上传,包含对话列表List[Dialogue]。
数据结构
-
Dialogue:
dialogue:List[Utterance]meta:Metagrounding:strreddit_id:str
-
Utterance:
id:strspeaker:strtext:strdiscourse:Triplet[from: str, to: str, relation: str]sentiment:Pair[class: str, score: float]AMT:str
-
Meta:
id:strtitle:strscore:floatcomms_num:inturl:strcreated:str
注意事项
- 部分对话可能具有相同的开头,这是由于对话是根据源数据中的复制品树构建的复制品链。



