中国临床文本综合语法和语义语料库
收藏arXiv2016-11-08 更新2024-06-21 收录
下载链接:
http://github.com/WILAB-HIT/Resources
下载链接
链接失效反馈官方服务:
资源简介:
中国临床文本综合语法和语义语料库是由哈尔滨工业大学计算机科学与技术学院等机构合作创建的,旨在为临床领域的自然语言处理研究提供基础数据。该数据集包含138份中文临床文档,总计47,424个词条和2553个完整的解析树,以及992份文档,标注了39,511个实体及其断言和7695个关系。数据集的创建过程采用了迭代标注方法,确保了标注质量。该数据集主要应用于临床文本的语法和语义分析,为开发和评估自然语言处理技术提供了重要资源。
Comprehensive Grammar and Semantic Corpus of Chinese Clinical Texts was collaboratively developed by the School of Computer Science and Technology of Harbin Institute of Technology and other institutions, aiming to provide foundational data for natural language processing (NLP) research in the clinical domain. This corpus includes 138 Chinese clinical documents, totaling 47,424 tokens and 2,553 complete parse trees, as well as 992 annotated documents with 39,511 entities, their assertions, and 7,695 relations. An iterative annotation method was adopted during the corpus construction process to ensure annotation quality. This corpus is primarily applied to grammatical and semantic analysis of clinical texts, serving as a critical resource for developing and evaluating natural language processing technologies.
提供机构:
哈尔滨工业大学计算机科学与技术学院
创建时间:
2016-11-07



