five

huatuo_consultation_qa

收藏
魔搭社区2025-11-27 更新2025-01-25 收录
下载链接:
https://modelscope.cn/datasets/FreedomIntelligence/huatuo_consultation_qa
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for huatuo_consultation_qa ## Dataset Description - **Homepage: https://www.huatuogpt.cn/** - **Repository: https://github.com/FreedomIntelligence/HuatuoGPT** - **Paper: https://arxiv.org/abs/2305.01526** - **Leaderboard:** - **Point of Contact:** ### Dataset Summary We collected data from a website for medical consultation , consisting of many online consultation records by medical experts. Each record is a QA pair: a patient raises a question and a medical doctor answers the question. The basic information of doctors (including name, hospital organization, and department) was recorded. We directly crawl patient’s questions and doctor’s answers as QA pairs, getting 32,708,346 pairs. Subsequently, we removed the QA pairs containing special characters and removed the repeated pairs. Finally, we got 25,341,578 QA pairs. **Please note that for some reasons we cannot directly provide text data, so the answer part of our data set is url. If you want to use text data, you can refer to the other two parts of our open source datasets ([huatuo_encyclopedia_qa](https://huggingface.co/datasets/FreedomIntelligence/huatuo_encyclopedia_qa)、[huatuo_knowledge_graph_qa](https://huggingface.co/datasets/FreedomIntelligence/huatuo_knowledge_graph_qa)), or use url for data collection.** ## Dataset Creation ### Source Data .... ## Citation ``` @misc{li2023huatuo26m, title={Huatuo-26M, a Large-scale Chinese Medical QA Dataset}, author={Jianquan Li and Xidong Wang and Xiangbo Wu and Zhiyi Zhang and Xiaolong Xu and Jie Fu and Prayag Tiwari and Xiang Wan and Benyou Wang}, year={2023}, eprint={2305.01526}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```

# 华佗咨询问答(huatuo_consultation_qa)数据集卡片 ## 数据集说明 - **主页:https://www.huatuogpt.cn/** - **代码仓库:https://github.com/FreedomIntelligence/HuatuoGPT** - **论文:https://arxiv.org/abs/2305.01526** - **排行榜:** - **联系人:** ### 数据集概述 我们从医疗咨询网站采集数据,包含众多医学专家的在线咨询记录。每条记录均为一组问答对(QA pair):患者发起提问,医师予以解答。同时记录了医师的基础信息,涵盖姓名、所属医疗机构及科室。 我们直接爬取患者提问与医师答复以构建问答对,初始共获取32,708,346组问答对。随后我们过滤掉包含特殊字符的问答对,并剔除重复条目,最终得到25,341,578组问答对。 **请注意,因部分原因我们无法直接提供文本数据,因此本数据集的答案部分以URL形式存储。若您需要使用文本数据,可参考我们开源的另外两个数据集:[华佗百科问答(huatuo_encyclopedia_qa)](https://huggingface.co/datasets/FreedomIntelligence/huatuo_encyclopedia_qa)、[华佗知识图谱问答(huatuo_knowledge_graph_qa)](https://huggingface.co/datasets/FreedomIntelligence/huatuo_knowledge_graph_qa),或通过URL自行采集数据。** ## 数据集构建 ### 源数据 …… ## 引用 @misc{li2023huatuo26m, title={华佗-26M:大规模中文医疗问答数据集}, author={Jianquan Li and Xidong Wang and Xiangbo Wu and Zhiyi Zhang and Xiaolong Xu and Jie Fu and Prayag Tiwari and Xiang Wan and Benyou Wang}, year={2023}, eprint={2305.01526}, archivePrefix={arXiv}, primaryClass={cs.CL} }
提供机构:
maas
创建时间:
2025-01-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作