five

HybridDialogue

收藏
魔搭社区2026-01-06 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/cerebras/HybridDialogue
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Information A pre-processed version of the HybridDialogue dataset. The dataset was created as part of our work on Cerebras DocChat - a document-based conversational Q&A model. This dataset is intended to be used for training purposes, and so overlapping samples with the HybridDialogue test set in [ChatRAG](https://huggingface.co/datasets/nvidia/ChatRAG-Bench) have been removed. Each sample in this dataset contains a `messages` multi-turn conversation, a `document` which is a concatenated representation of relevant document(s), and `answers` for the current turn. # Acknowledgement This dataset is a processed version of the HybridDialogue dataset. ``` @inproceedings{nakamura2022hybridialogue, title={HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data}, author={Nakamura, Kai and Levy, Sharon and Tuan, Yi-Lin and Chen, Wenhu and Wang, William Yang}, booktitle={Findings of the Association for Computational Linguistics: ACL 2022}, year={2022} } ```

# 数据集信息 本数据集为HybridDialogue数据集的预处理版本,是我们针对Cerebras DocChat(一款基于文档的会话式问答模型)开展的研究工作的组成部分。本数据集仅用于训练场景,因此已移除与[ChatRAG](https://huggingface.co/datasets/nvidia/ChatRAG-Bench)中的HybridDialogue测试集存在样本重叠的条目。 本数据集的每条样本均包含`messages`字段(多轮会话内容)、`document`字段(相关文档的拼接表示)以及`answers`字段(当前轮次的回答)。 # 致谢 本数据集为HybridDialogue数据集的处理后版本。 @inproceedings{nakamura2022hybridialogue, title={HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data}, author={Nakamura, Kai and Levy, Sharon and Tuan, Yi-Lin and Chen, Wenhu and Wang, William Yang}, booktitle={Findings of the Association for Computational Linguistics: ACL 2022}, year={2022} }
提供机构:
maas
创建时间:
2025-10-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作