Medical Text Dataset
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/Stephen-SMJ/LLamaCare
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了从多种资源中收集的医疗文本,其中包括关于疾病诊断和治疗建议的真实及模拟的医疗话题对话。该数据集融合了专业的医疗知识和推理能力,以增强对话功能。规模上,数据集包含了116.41千个样本和15.61百万个词汇。该数据集的任务是为医疗语言模型的训练进行知识注入。
This dataset collects medical texts from various sources, including both real and simulated dialogues on medical topics related to disease diagnosis and treatment recommendations. It integrates professional medical knowledge and reasoning capabilities to enhance its conversational functions. In terms of scale, the dataset contains 116.41 thousand samples and 15.61 million words. The core task of this dataset is to inject domain-specific knowledge for the training of medical language models.



