five

symptom_to_diagnosis

收藏
魔搭社区2025-12-05 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/gretelai/symptom_to_diagnosis
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Summary This dataset contains natural language descriptions of symptoms labeled with 22 corresponding diagnoses. `Gretel/symptom_to_diagnosis` provides 1065 symptom descriptions in the English language labeled with 22 diagnoses, focusing on fine-grained single-domain diagnosis. ## Data Fields Each row contains the following fields: * `input_text` : A string field containing symptoms * `output_text` : A string field containing a diagnosis Example: ``` { "output_text": "drug reaction", "input_text": "I've been having headaches and migraines, and I can't sleep. My whole body shakes and twitches. Sometimes I feel lightheaded." } ``` ## Diagnoses This table contains the count of each diagnosis in the train and test splits. | | Diagnosis | train.jsonl | test.jsonl | |---:|:--------------------------------|--------------:|-------------:| | 0 | drug reaction | 40 | 8 | | 1 | allergy | 40 | 10 | | 2 | chicken pox | 40 | 10 | | 3 | diabetes | 40 | 10 | | 4 | psoriasis | 40 | 10 | | 5 | hypertension | 40 | 10 | | 6 | cervical spondylosis | 40 | 10 | | 7 | bronchial asthma | 40 | 10 | | 8 | varicose veins | 40 | 10 | | 9 | malaria | 40 | 10 | | 10 | dengue | 40 | 10 | | 11 | arthritis | 40 | 10 | | 12 | impetigo | 40 | 10 | | 13 | fungal infection | 39 | 9 | | 14 | common cold | 39 | 10 | | 15 | gastroesophageal reflux disease | 39 | 10 | | 16 | urinary tract infection | 39 | 9 | | 17 | typhoid | 38 | 9 | | 18 | pneumonia | 37 | 10 | | 19 | peptic ulcer disease | 37 | 10 | | 20 | jaundice | 33 | 7 | | 21 | migraine | 32 | 10 | ## Data Splits The data is split to 80% train (853 examples, 167kb) and 20% test (212 examples, 42kb). ## Dataset Creation Data was filtered to remove unwanted categories and updated using an LLM to create language more consistent with how a patient would describe symptoms in natural language to a doctor. ## Source Data This dataset was adapted based on the [Symptom2Disease](https://www.kaggle.com/datasets/niyarrbarman/symptom2disease) dataset from Kaggle. ## Personal and Sensitive Information The symptoms in this dataset were modified from their original format using an LLM and do not contain personal data. ## Limitations This dataset is licensed Apache 2.0 and free for use.

# 数据集概述 本数据集包含标注了22种对应诊断结果的症状自然语言描述。`Gretel/symptom_to_diagnosis` 包含1065条英文症状描述,均标注了22种诊断类别,聚焦于细粒度的单领域诊断任务。 ## 数据字段 每一行均包含以下字段: * `input_text`:存储症状内容的字符串字段 * `output_text`:存储诊断结果的字符串字段 示例: { "output_text": "药物不良反应(drug reaction)", "input_text": "我持续头痛并伴有偏头痛,无法入眠,全身震颤抽搐,有时还会感到头晕目眩。" } ## 诊断类别 下表展示了训练集与测试集中各诊断类别的样本数量: | | 诊断名称 | train.jsonl | test.jsonl | |---:|:--------------------------------|--------------:|-------------:| | 0 | 药物不良反应 | 40 | 8 | | 1 | 过敏反应 | 40 | 10 | | 2 | 水痘 | 40 | 10 | | 3 | 糖尿病 | 40 | 10 | | 4 | 银屑病 | 40 | 10 | | 5 | 高血压 | 40 | 10 | | 6 | 颈椎病 | 40 | 10 | | 7 | 支气管哮喘 | 40 | 10 | | 8 | 静脉曲张 | 40 | 10 | | 9 | 疟疾 | 40 | 10 | | 10 | 登革热 | 40 | 10 | | 11 | 关节炎 | 40 | 10 | | 12 | 脓疱病 | 40 | 10 | | 13 | 真菌感染 | 39 | 9 | | 14 | 普通感冒 | 39 | 10 | | 15 | 胃食管反流病 | 39 | 10 | | 16 | 尿路感染 | 39 | 9 | | 17 | 伤寒 | 38 | 9 | | 18 | 肺炎 | 37 | 10 | | 19 | 消化性溃疡病 | 37 | 10 | | 20 | 黄疸 | 33 | 7 | | 21 | 偏头痛 | 32 | 10 | ## 数据划分 数据集按照80%的比例划分为训练集(853条样本,大小167KB)与20%的测试集(212条样本,大小42KB)。 ## 数据集构建 研究人员对数据进行了过滤以移除无关类别,并通过大语言模型(Large Language Model)对文本进行优化,使其表述更贴合患者向医生描述症状时的自然语言风格。 ## 原始数据来源 本数据集改编自Kaggle平台上的[Symptom2Disease](https://www.kaggle.com/datasets/niyarrbarman/symptom2disease)数据集。 ## 个人与敏感信息说明 本数据集的症状描述已通过大语言模型(Large Language Model)从原始格式修改而来,未包含任何个人隐私数据。 ## 局限性说明 本数据集采用Apache 2.0许可证,可免费使用。
提供机构:
maas
创建时间:
2025-05-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作