five

常见恶性肿瘤CT诊断结构化报告数据

收藏
天津市数据知识产权登记平台2024-05-09 更新2024-05-25 收录
下载链接:
https://dengji.tjippc.cn/xxgg_nr?id=63410a73-2815-4ae4-a458-40e28c1cc448
下载链接
链接失效反馈
官方服务:
资源简介:
从医生自由撰写的胸部CT(计算机断层扫描)文本报告中提取关键信息,进而生成规范的胸部CT结构化报告,涉及以下步骤和算法规则: 1. 数据预处理:医生对原始文本数据进行清洗,包括去除无关信息、纠正拼写错误、标准化术语。 2. 基于自然语言处理的信息提取:利用自然语言处理技术,如分词、词性标注、命名实体识别等,对文本进行语义分析。识别文本中的关键信息,包括病人检查部位、异常描述等。分析句法结构,利用依存关系分析等技术,理解句子中不同元素之间的关系,如主谓关系、定中关系等。 3. 实体识别和关系提取:通过实体识别技术,识别文本中的实体,如病人性别,检查日期、部位,疾病名称等。确定实体之间的关系,如病人与检查部位的关联,异常所在的具体位置等。 4. 敏感信息的去除:对实体识别技术提取的敏感信息,如病人姓名、病人号等信息进行匿名化,确保患者数据的安全和隐私。 5. 文本分类和语义分析:将文本内容进行分类,如分类为病史描述、检查结果、医嘱等。对每个类别的文本进行语义分析,理解其中的含义,如对异常的描述、疾病的程度、建议的治疗方案等。 6. 模板匹配和结构化生成:设计结构化报告的模板,包括标题、病人信息、检查信息、异常描述、诊断意见等部分。自然语言处理技术将从原始文本中提取的关键信息填充到相应的模板位置,生成规范的结构化报告。 7. 质量控制和后处理:对生成的结构化报告进行质量控制,确保信息的准确性和完整性。如果有必要,进行报告复核,如根据特定规则进行错误修正或补充信息。 8. 持续优化和更新:不断优化算法和规则,通过反馈机制收集用户意见和经验,持续改进报告生成的效果和质量。

This dataset aims to extract key information from freely written chest CT (computed tomography) textual reports prepared by physicians, and generate standardized structured chest CT reports, involving the following steps and algorithmic rules: 1. Data Preprocessing: Physicians clean the original textual data, including removing irrelevant information, correcting spelling errors, and standardizing medical terminology. 2. NLP-based Information Extraction: Utilize natural language processing (NLP) technologies such as tokenization, part-of-speech tagging, named entity recognition (NER), etc., to conduct semantic analysis on the text. Identify key information in the text, including patient examination site, abnormal findings, etc. Analyze the syntactic structure, and employ technologies such as dependency parsing to understand the relationships between different elements in the sentence, such as subject-predicate relationship, attributive relationship, etc. 3. Entity Recognition and Relation Extraction: Identify entities in the text through entity recognition techniques, such as patient gender, examination date, examination site, disease name, etc. Determine the relationships between entities, such as the association between the patient and the examination site, the specific location of the abnormality, etc. 4. Sensitive Information Removal: Anonymize sensitive information extracted via entity recognition techniques, such as patient name, patient ID number and other information, to ensure the security and privacy of patient data. 5. Text Classification and Semantic Analysis: Classify the text content, such as categorizing it into medical history description, examination results, medical orders, etc. Conduct semantic analysis on each category of text to understand their meanings, including descriptions of abnormalities, severity of diseases, recommended treatment plans, etc. 6. Template Matching and Structured Report Generation: Design templates for structured reports, including sections such as title, patient information, examination information, abnormal findings, diagnostic opinions, etc. Natural language processing technology fills the key information extracted from the original text into the corresponding template positions to generate standardized structured reports. 7. Quality Control and Post-processing: Perform quality control on the generated structured reports to ensure the accuracy and completeness of the information. If necessary, conduct report review, such as correcting errors or supplementing information in accordance with specific rules. 8. Continuous Optimization and Update: Continuously optimize algorithms and rules, collect user opinions and experience through feedback mechanisms, and continuously improve the effect and quality of report generation.
提供机构:
天津博思特医疗科技有限责任公司
创建时间:
2024-05-09
搜集汇总
数据集介绍
main_image_url
特点
该数据集包含56260条常见恶性肿瘤CT诊断结构化报告数据,每年更新一次,主要用于医疗领域的人工智能语言大模型开发,支持CT生成式诊断报告的生成。数据集通过自然语言处理技术从医生自由撰写的文本报告中提取关键信息,并生成规范的结构化报告。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务