皮肤科疾病共病数据集
收藏天津市数据知识产权登记平台2024-12-27 更新2025-01-13 收录
下载链接:
https://dengji.tjippc.cn/xxgg_nr?id=ad72096f-885d-4498-bb11-44eb721bb0d9
下载链接
链接失效反馈官方服务:
资源简介:
多维诊疗数据构建患者主索引:将患者数据特征向量定义为患者性别、住址、家族遗传病、过敏原等信息,使用DBSCAN算法,基于特征向量的密度,将密度相近的数据点划为同一个簇,将患者数据点进行聚类,每个聚类可以视为一个患者群体,作为主索引的标识。
电子病历质控分类模型:该模型通过自然语言处理技术对电子病历中的主诉、现病史、既往史等文本进行识别和分析,提取关键信息并进行分类。包含7个类别,每类250个样本。数据处理包括标签化、分词,并转换为TXT文件。用 BERT的分词器将病历文本转化为BERT所需的输入格式,质控标签转换为数值标签。训练集与测试集按9:1比例划分。使用 BertForSequenceClassification模型进行训练。模型评估通过 classification_report 方法进行。参数更新步骤包括将数据放入指定文件夹,运行训练和更新命令,确保模型、标签和标签名同步。
Building a Patient Master Index with Multi-dimensional Clinical Data: The feature vector of patient data is defined as information including the patient's gender, residential address, family genetic disorders, allergens and other relevant details. The DBSCAN algorithm is utilized to cluster patient data points based on the density of their feature vectors, grouping data points with similar densities into the same cluster. Each cluster can be treated as a patient group, which acts as the identifier for the patient master index.
Electronic Medical Record Quality Control Classification Model: This model adopts natural language processing technologies to identify and analyze texts including chief complaints, present histories and past medical histories in electronic medical records, extract key information and perform classification. It consists of 7 categories, with 250 samples per category. Data processing procedures include labeling, word segmentation and conversion into TXT format files. The BERT tokenizer is employed to transform medical record texts into the input format required by BERT, while quality control labels are converted into numerical labels. The training set and test set are split at a ratio of 9:1. The BertForSequenceClassification model is used for training. Model evaluation is conducted via the classification_report method. The parameter update steps include placing the data into the specified folder, running the training and update commands, and ensuring synchronization among the model, labels and label names.
提供机构:
天津健康医疗大数据有限公司
创建时间:
2024-12-10
搜集汇总
数据集介绍

特点
皮肤科疾病共病数据集包含29万条诊疗记录,每月更新,涵盖从入院到出院的20个关键字段。该数据集适用于医疗、教学和科研领域,特别是诊疗模式研究和药物经济学研究,支持临床决策和个性化治疗方案制定。数据处理采用DBSCAN算法和BERT模型,确保数据质量和分析准确性。
以上内容由遇见数据集搜集并总结生成



