five

心血管疾病风险因素注释语料库

收藏
arXiv2017-03-03 更新2024-06-21 收录
下载链接:
https://github.com/WILAB-HIT/RiskFactor
下载链接
链接失效反馈
官方服务:
资源简介:
心血管疾病风险因素注释语料库是由哈尔滨工业大学计算机科学与技术学院语言技术研究中心开发的首个关注心血管疾病风险因素的中文语料库。该数据集基于600名患者的去标识化出院总结和进展记录,包含9678条注释,涵盖了12种心血管疾病风险因素。数据集的创建过程包括设计轻量级注释任务、制定注释指南、培训注释者和构建语料库。该数据集旨在为开发风险因素信息提取系统提供基础,进而建立一个长期监测平台,帮助监督心血管疾病风险因素的变化,预测风险因素趋势,管理慢性疾病,并估计心血管疾病的进展。

The Annotated Corpus for Cardiovascular Disease Risk Factors is the first Chinese corpus focusing on cardiovascular disease (CVD) risk factors, developed by the Research Center for Language Technology, School of Computer Science and Technology, Harbin Institute of Technology. This dataset is built upon de-identified discharge summaries and progress notes of 600 patients, containing 9,678 annotated entries covering 12 types of CVD risk factors. The construction process of this dataset includes designing lightweight annotation tasks, formulating standardized annotation guidelines, training annotators, and assembling the final corpus. This dataset aims to provide a foundational resource for developing risk factor information extraction systems, thereby enabling the establishment of a long-term monitoring platform to oversee changes in CVD risk factors, predict their trends, manage chronic diseases, and estimate the progression of cardiovascular diseases.
提供机构:
哈尔滨工业大学计算机科学与技术学院语言技术研究中心
创建时间:
2016-11-28
二维码
社区交流群
二维码
科研交流群
商业服务