结构与非结构化的肝癌临床诊断数据集

Name: 结构与非结构化的肝癌临床诊断数据集
Creator: 哈尔滨工业大学
Published: 2026-01-30T15:43:13+08:00

国家基础学科公共科学数据中心2026-01-30 收录

肝癌诊断

医疗数据分析

数据链接：

https://nbsdc.cn/general/dataDetail?id=687e4a46195d263b6dc8b27c&type=1 数据链接链接失效反馈

官方服务：

资源简介：

该数据集面向肝癌患者的临床诊断任务需求，涵盖结构化与非结构化的医疗数据。数据来源于中国301医院，基于真实患者的临床数据构建，并在数据采集过程中通过去隐私化手段对患者的个人信息进行处理，确保数据安全与合规。数据集的构建围绕“肝胆胰”类疾病，旨在为肝癌相关诊断研究提供高质量数据支持，推动疾病预测与诊断方法的优化，具有重要的临床与科研意义。数据集包含文本数据和表格数据两种主要形式，其中文本数据记录了患者的主诉、现病史、既往史、个人史、家族史、体格检查以及化验与特殊检查等信息。在任务设置中，选择主诉、现病史与既往史作为关键特征字段，用于预测患者是否患有肝癌；这些非结构化文本数据通过自然语言处理技术提取特征，以提升诊断的精准性。表格数据则聚焦于患者的具体检查指标，包括血常规、凝血四项、血清术前八项、癌胚抗原测定等关键项目。其中，癌胚抗原测定（CA724）、胰腺功能检查（淀粉酶）、血清术前八项（乙肝表面抗原）以及甲胎蛋白测定等指标被作为预测肝癌的核心输入变量。这些结构化数据的标准化与分析对任务建模具有重要意义。该数据集包含数百例患者的数据，具有较高的多样性与代表性，能够满足不同模型和算法的训练需求。数据集可用于构建多模态肝癌诊断模型，包括基于深度学习的分类模型与规则驱动的专家系统等，适合科研机构和医疗机构开展相关研究。通过整合结构化与非结构化数据，该数据集旨在提升肝癌的早期诊断水平，并为相关疾病的诊断辅助工具开发奠定数据基础。这不仅有助于推动人工智能技术在医疗领域的应用，也为医学研究提供了宝贵的资源。

This dataset is developed for the clinical diagnosis task of liver cancer patients, covering structured and unstructured medical data. It is constructed based on real clinical data from 301 Hospital of China, and patient personal information is processed via de-identification methods during data collection to ensure data security and compliance. Centered on hepatobiliary and pancreatic diseases, the dataset aims to provide high-quality data support for liver cancer-related diagnosis research, promote the optimization of disease prediction and diagnosis methods, and possesses important clinical and research significance. The dataset mainly includes two forms of data: text data and tabular data. The text data records information such as the patient's chief complaint, current medical history, past medical history, personal history, family history, physical examination, laboratory and specialized examinations. In the task setup, chief complaint, current medical history and past medical history are selected as key feature fields for predicting whether a patient has liver cancer; features are extracted from these unstructured text data via natural language processing techniques to improve diagnostic accuracy. The tabular data focuses on specific examination indicators of patients, including key items such as complete blood count (CBC), four coagulation tests, eight preoperative serum tests, and carcinoembryonic antigen assay. Among them, indicators such as carcinoembryonic antigen assay (CA724), pancreatic function test (amylase), eight preoperative serum tests (hepatitis B surface antigen, HBsAg) and alpha-fetoprotein (AFP) assay are used as core input variables for liver cancer prediction. The standardization and analysis of these structured data are of great significance for task modeling. This dataset contains data from hundreds of patients, with high diversity and representativeness, which can meet the training requirements of various models and algorithms. It can be used to build multimodal liver cancer diagnosis models, including deep learning-based classification models and rule-driven expert systems, and is suitable for research institutions and medical institutions to carry out related research. By integrating structured and unstructured data, the dataset aims to improve the early diagnosis level of liver cancer and lay a data foundation for the development of diagnostic auxiliary tools for related diseases. This not only helps promote the application of artificial intelligence technology in the medical field, but also provides valuable resources for medical research.

提供机构：

哈尔滨工业大学

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是一个面向肝癌临床诊断的综合性资源，包含结构化与非结构化医疗数据，来源于中国301医院的真实患者信息，并经过去隐私化处理以确保安全合规。数据集涵盖数百例患者的文本记录（如主诉、病史）和表格检查指标（如血常规、癌胚抗原），旨在支持多模态诊断模型研究，提升肝癌早期诊断水平，推动人工智能在医疗领域的应用。数据格式为json和docx，总数据量5.23MB，具有较高的多样性和代表性，适合科研和医疗机构使用。

以上内容由遇见数据集搜集并总结生成

结构与非结构化的肝癌临床诊断数据集

资源简介：

相关数据集