HybridDialogue
收藏魔搭社区2026-01-06 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/cerebras/HybridDialogue
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Information
A pre-processed version of the HybridDialogue dataset. The dataset was created as part of our work on Cerebras DocChat - a document-based conversational Q&A model. This dataset is intended to be used for training purposes, and so overlapping samples with the HybridDialogue test set in [ChatRAG](https://huggingface.co/datasets/nvidia/ChatRAG-Bench) have been removed.
Each sample in this dataset contains a `messages` multi-turn conversation, a `document` which is a concatenated representation of relevant document(s), and `answers` for the current turn.
# Acknowledgement
This dataset is a processed version of the HybridDialogue dataset.
```
@inproceedings{nakamura2022hybridialogue,
title={HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data},
author={Nakamura, Kai and Levy, Sharon and Tuan, Yi-Lin and Chen, Wenhu and Wang, William Yang},
booktitle={Findings of the Association for Computational Linguistics: ACL 2022},
year={2022}
}
```
# 数据集信息
本数据集为HybridDialogue数据集的预处理版本,是我们针对Cerebras DocChat(一款基于文档的会话式问答模型)开展的研究工作的组成部分。本数据集仅用于训练场景,因此已移除与[ChatRAG](https://huggingface.co/datasets/nvidia/ChatRAG-Bench)中的HybridDialogue测试集存在样本重叠的条目。
本数据集的每条样本均包含`messages`字段(多轮会话内容)、`document`字段(相关文档的拼接表示)以及`answers`字段(当前轮次的回答)。
# 致谢
本数据集为HybridDialogue数据集的处理后版本。
@inproceedings{nakamura2022hybridialogue,
title={HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data},
author={Nakamura, Kai and Levy, Sharon and Tuan, Yi-Lin and Chen, Wenhu and Wang, William Yang},
booktitle={Findings of the Association for Computational Linguistics: ACL 2022},
year={2022}
}
提供机构:
maas
创建时间:
2025-10-23



