CMCQA
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/WENGSYX/CMCQA
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个大规模的中文医疗领域对话式问答数据集,包含了130万个完整的对话会话,即1983万条语句。这些数据来自45个不同的医疗科室,并已开源,旨在推动对话式问答研究的发展。该数据集的规模宏大,拥有130万个会话、1983万条语句以及6.5亿个词汇标记。其研究任务主要针对的是对话式问答。
This dataset is a large-scale Chinese medical domain conversational question answering dataset, containing 1.3 million complete dialogue sessions, namely 19.83 million utterances. These data are sourced from 45 distinct medical departments and have been open-sourced, aiming to promote the development of conversational question answering research. It has an extensive scale, with 1.3 million sessions, 19.83 million utterances and 650 million lexical tokens. Its primary research task focuses on conversational question answering.
提供机构:
Open-sourced by the authors
搜集汇总
数据集介绍

背景与挑战
背景概述
CMCQA是一个大型中文医学对话问答数据集,涵盖45个医学科室,包含130万完整会话和1983万条语句,总容量2.84GB。该数据集旨在促进医学领域对话式问答的研究和发展。
以上内容由遇见数据集搜集并总结生成



