Chinese Dependency Treebank 1.0

Name: Chinese Dependency Treebank 1.0
Creator: Linguistic Data Consortium
Published: 2021-07-01 16:24:22
License: 暂无描述

DataCite Commons2021-07-01 更新2025-04-16 收录

下载链接：

https://catalog.ldc.upenn.edu/LDC2012T05

下载链接

链接失效反馈

官方服务：

资源简介：

<h3>Introduction</h3><br> <p>Chinese Dependency Treebank 1.0 was developed by the <a href="http://ir.hit.edu.cn/">Harbin Institute of Technologys Research Center for Social Computing and Information Retrieval</a> (HIT-SCIR). It contains 49,996 Chinese sentences (902,191 words) randomly selected from Peoples Daily newswire stories published between 1992 and 1996 and annotated with syntactic dependency structures.</p><br> <h3>Data</h3><br> <p>Ill-formed or short sentences were eliminated from the randomly-selected sentences prior to annotation. The data was segmented and annotated for part of speech (POS), syntactic structures, verb subclasses and noun compounds.Word segmentation and POS tagging were accomplished automatically using statistical models trained on a larger, annotated corpus of Peoples Daily newswire stories. Humans manually annotated the syntactic structures and corrected word segmentation errors. POS tags were not corrected.</p><br> <p>The data is provided in the format of CoNLL-X and in UTF-8. One line presents information for one word. An empty line indicates the end of a sentence. Each line contains 10 columns separated with a tab.</p><br> <h3>Samples</h3><br> <p>Please click follow this <a href="desc/addenda/LDC2012T05.html" rel="nofollow">link</a> for a sample of the data.</p><br> <h3>Updates</h3><br> <p>None at this time.</p></br> Portions © 1992-1996 Peoples Daily, © 2012 Harbin Institute of Technology, Research Center for Social Computing and Information Retrieval, © 2012 Trustees of the University of Pennsylvania

提供机构：

Linguistic Data Consortium

创建时间：

2020-11-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集