医疗数据治理服务
收藏北京国际大数据交易所2024-03-01 收录
下载链接:
https://webs.bjidex.com/sys-bsc-home/#/bscConsole/tradingMarket/detail?id=246
下载链接
链接失效反馈官方服务:
资源简介:
临床研究数据散落在各个医疗信息系统中,多个系统的标准不同、数据质量参差不齐。医渡云通过运用机器学习、自然语言处理(NLP)的技术协助医院实现多系统、多数据源的数据集成、数据标准化统一生产,并实现非结构化长文本病历转化为结构化数据。1.数据清洗将医院的医疗数据中,因各种历史原因或不明原因导致的不规范、错误的字段信息进行清洗,避免因部分明显错误的信息导致上层应用服务的结论错误。 2.数据标准化处理基于国内、国际、医疗行业、国际疾病指南等标准,例如ICD10\ICD9、HL7CDA\医学主题词表(MeSH)\观测指标标识符逻辑命名与编码系统(LOINC)\药品词典规范-CFDA,ATC分类\国家卫计委 医疗机构诊疗科目名录\国际性肿瘤数据库结构\肿瘤学国际诊治指南等,对数据进行标准化处理。 通过自然语言的同义词表、医学术语的同义关联词表,在数据挖掘的算法指导下,对因不同文字表达但含义相对的字段信息进行归一,为后续和上层应用提供正确且统一的信息表达。 3.数据后结构化处理将医院医疗业务系统中的数据,通过自然语义处理技术,结合医疗专业术语的语义结构,将医疗语义信息从原始的自然语言表达,扩展分析为结构化的Key-Value模式,为后续的应用、挖掘、机器学习提供基础数据支持。结构化主要从若干个独立维度来进行,对数据依据主题字段进行划分,主要主题字段有:症状、体征、烟酒情况、病理诊断、病理表现、过敏情况、婚育状况等。根据病理或报告中不同字段的语义复杂程度和实际需求,目前结构化框架主要由正则抽取和通用框架组成。支持临床基础字段集,覆盖常规的检验、检查、症状、疾病生命体征、家族史、婚育史、手术、输液、药品医嘱等结构化处理。4.数据质控管理针对数据处理过程中的多层数据,采用定量、定性综合校验方法,提供多维质量监控、问题预警功能,协助大数据运营企业及医疗机构信息部门发现完整性、一致性、准确性、唯一性、稳定性等数据质量问题。 5.数据脱敏与加密处理数据脱敏和加密处理,指对患者个人信息中的敏感信息(如:患者姓名、身份证号、电话、地址等)通过脱敏或加密规则进行数据的变形,实现敏感隐私数据的可靠保护,同时依然保持其它数据的格式和属性,保证其可识别性和可用性。
Clinical research data is scattered across various medical information systems, with inconsistent standards and uneven data quality across different systems. Yidu Cloud uses machine learning and natural language processing (NLP) technologies to help hospitals achieve data integration, standardization and unified production across multiple systems and data sources, and convert unstructured long-text medical records into structured data.
1. Data Cleaning
Clean non-standard and erroneous field information in hospital medical data caused by various historical or unknown reasons, so as to avoid erroneous conclusions from upper-layer application services caused by some obviously incorrect information.
2. Data Standardization
Standardize data based on domestic, international, medical industry and international disease guideline standards, such as ICD-10, ICD-9, HL7 CDA, Medical Subject Headings (MeSH), Logical Observation Identifiers Names and Codes (LOINC), Drug Dictionary Specifications - CFDA, ATC classification, National Health and Family Planning Commission's List of Medical Institution Diagnosis and Treatment Subjects, International Tumor Database Structure, International Oncology Diagnosis and Treatment Guidelines, etc. Through natural language synonym tables and medical term synonymous relation tables, under the guidance of data mining algorithms, normalize field information with different textual expressions but consistent meanings, to provide correct and unified information expression for subsequent upper-layer applications.
3. Post-data Structuring
Convert data from hospital medical business systems: through natural language processing technology combined with the semantic structure of medical professional terms, expand and analyze medical semantic information from the original natural language expression into a structured Key-Value mode, providing basic data support for subsequent applications, data mining and machine learning. Structuring is mainly carried out from several independent dimensions, dividing data according to subject fields. The main subject fields include: symptoms, signs, tobacco and alcohol consumption, pathological diagnosis, pathological manifestations, allergies, marital and fertility status, etc. According to the semantic complexity of different fields in pathology or reports and actual needs, the current structuring framework mainly consists of regular expression extraction and a general framework. It supports the clinical basic field set, covering structured processing of conventional examinations, tests, symptoms, diseases, vital signs, family history, marital and fertility history, surgery, infusion, drug orders and other contents.
4. Data Quality Control Management
For multi-layer data in the data processing process, adopt comprehensive quantitative and qualitative verification methods, provide multi-dimensional quality monitoring and problem early warning functions, and assist big data operation enterprises and hospital information departments to discover data quality issues such as integrity, consistency, accuracy, uniqueness and stability.
5. Data Desensitization and Encryption
Data desensitization and encryption refers to deforming sensitive information in patients' personal information (such as patient name, ID number, phone number, address, etc.) through desensitization or encryption rules, to achieve reliable protection of sensitive private data, while maintaining the format and attributes of other data to ensure their recognizability and usability.
提供机构:
医渡云(北京)技术有限公司
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集描述了医渡云提供的医疗数据治理服务,通过机器学习等技术整合多源异构医疗数据,实现数据清洗、标准化、结构化处理及安全管理,解决医疗机构数据分散和质量不一的问题。服务涵盖从数据整合到隐私保护的全流程治理,为临床研究和应用提供高质量数据基础。
以上内容由遇见数据集搜集并总结生成



