Haodf Doctor Recommendation Dataset
收藏IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/haodf-doctor-recommendation-dataset
下载链接
链接失效反馈官方服务:
资源简介:
We collected patient-doctor interaction data from the Haodf online consultation platform on the six common diseases, categorized by different risk levels. Low-risk diseases include Common Cold (Cold) and Pneumonia (Pneu.), medium-risk diseases include Diabetes (Diab.) and Depression (Depr.), and high-risk diseases include Coronary Heart Disease (CHD) and Lung Cancer (Lung.). We only use publicly accessible data, with all patients and doctors remaining anonymous, ensuring effective protection of their privacy. To further evaluate the effectiveness of identifying the most relevant doctors for treating a patient’s symptoms, we also collected disease tags t for each patient suffering from x. These tags offer a more detailed description of the patient’s condition, allowing for more precise treatment matching. For example, For instance, the tag Viral Pneumonia provides a more specific categorization under the broader category of Pneumonia. Similarly, Malignant Tumor is a detailed tag used for patients diagnosed with Lung Cancer. It is important to note that the disease tag is used solely for evaluation purposes and is not involved in any of the training processes. Detailed statistics of the dataset are provided in Table I. For the dataset split, we divided the records of each doctor’s consultation cases into training, validation, and test sets in a ratio of 8:1:1.
我们从好大夫在线(Haodf)问诊平台采集了针对六种常见疾病的医患交互数据,并按风险等级对这些疾病进行分类。其中,低风险疾病涵盖普通感冒(Common Cold,缩写Cold)与肺炎(Pneumonia,缩写Pneu.);中风险疾病涵盖糖尿病(Diabetes,缩写Diab.)与抑郁症(Depression,缩写Depr.);高风险疾病涵盖冠心病(Coronary Heart Disease,缩写CHD)与肺癌(Lung Cancer,缩写Lung.)。本数据集仅使用公开可获取的数据,所有患者与医生均已完成匿名化处理,有效保障了医患双方的隐私安全。为进一步评估为患者症状匹配最适配接诊医生的有效性,我们还为每位就诊患者采集了疾病标签t。此类标签可对患者病情进行更细致的描述,从而实现更精准的诊疗匹配。例如,病毒性肺炎(Viral Pneumonia)这一标签可在肺炎这一大类下实现更具体的分类;类似地,恶性肿瘤(Malignant Tumor)则是针对肺癌患者的细分标签。需特别说明的是,疾病标签仅用于模型效果评估环节,并未参与任何训练流程。本数据集的详细统计信息详见表1(Table I)。在数据集划分方面,我们将每位医生的问诊病例记录按照8:1:1的比例划分为训练集、验证集与测试集。
提供机构:
Chen, Ping; Shen, Zhiqi; Zhang, Yinan; Jing, Jiazheng; Tao, Zhenchao



