five

高血压专病数据集

收藏
天津市数据知识产权登记平台2024-09-25 更新2024-10-14 收录
下载链接:
https://dengji.tjippc.cn/xxgg_nr?id=23168be2-f82f-43e2-93c2-d16be0602a68
下载链接
链接失效反馈
官方服务:
资源简介:
专病诊断名称分类模型:通过分析医学文献、临床数据和专家知识,建立一个诊断数据库。经过分词和打乱顺序的预处理后,使用 train_supervised 函数进行训练(迭代200次,学习率0.1,词N-grams长度为1,损失函数为"hs")。模型性能通过 classification_report 方法评估,表现良好。参数更新通过命令同步模型、标签和标签名,从而快速、准确地诊断专病类型。 电子病历质控分类模型:该模型通过自然语言处理技术对电子病历中的主诉、现病史、既往史等文本进行识别和分析,提取关键信息并进行分类。包含7个类别,每类250个样本。数据处理包括标签化、分词,并转换为TXT文件。用 BERT的分词器将病历文本转化为BERT所需的输入格式,质控标签转换为数值标签。训练集与测试集按9:1比例划分。使用 BertForSequenceClassification模型进行训练。模型评估通过 classification_report 方法进行。参数更新步骤包括将数据放入指定文件夹,运行训练和更新命令,确保模型、标签和标签名同步。

Specialized Disease Diagnosis Name Classification Model: This model constructs a diagnostic database by analyzing medical literature, clinical data and expert knowledge. After preprocessing including word segmentation and random shuffling, it is trained using the train_supervised function with 200 training iterations, learning rate of 0.1, word N-grams length of 1, and loss function set to "hs". The model's performance is evaluated via the classification_report method, showing good performance. Parameter updates are synchronized across the model, labels and label names through dedicated commands, enabling fast and accurate diagnosis of specialized disease types. Electronic Medical Record (EMR) Quality Control Classification Model: This model uses natural language processing (NLP) technologies to identify and analyze texts such as chief complaints, present medical history, past medical history and other relevant content in electronic medical records, extract key information and perform classification. It contains 7 categories with 250 samples per category. Data processing includes labeling, word segmentation and conversion to TXT files. The BERT tokenizer is used to convert medical record texts into the input format required by BERT, and quality control labels are converted into numerical labels. The training set and test set are split at a ratio of 9:1. The BertForSequenceClassification model is used for training. Model evaluation is conducted via the classification_report method. Parameter update steps include placing the data into the specified folder, running training and update commands to ensure synchronization of the model, labels and label names.
提供机构:
天津健康医疗大数据有限公司
创建时间:
2024-09-11
搜集汇总
数据集介绍
main_image_url
特点
高血压专病数据集包含1100万条高血压相关医疗数据,涵盖就诊、诊断、药品等多方面信息,每月更新,适用于医疗、教学和科研领域。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务