Chinese Treebank
收藏OpenDataLab2026-05-17 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/Chinese_Treebank
下载链接
链接失效反馈官方服务:
资源简介:
中国树库项目始于宾夕法尼亚大学 IRCS。后来,它搬到了科罗拉多大学博尔德分校的 CLEAR 实验室。该项目还有两个旧网站不再积极维护,一个在 PENN,另一个在 CU。那里的信息非常过时。
中国树库的开发得到了 DOD、NSF 和 DARPA TIDES、GALE 和 BOLT 计划的支持。 Chinese Treebank 的最新版本是 CTB 9.0,此版本涵盖的类型包括新闻专线、杂志文章、广播新闻、广播对话、新闻组和博客、论坛。该语料库目前正在扩展中,未来版本中将包含更多类型。
通过中国命题库项目,已将注释的语义层添加到中国树库中。中国命题银行的最新版本是 CPB 3.0,它也是通过语言数据联盟发布的。
The Chinese Treebank (CTB) project was initially hosted at the Institute for Research in Cognitive Science (IRCS) of the University of Pennsylvania. It was later relocated to the CLEAR Lab at the University of Colorado Boulder. There are two legacy websites for the project that are no longer actively maintained: one at PENN and the other at CU, where the information is severely outdated.
The development of the Chinese Treebank was supported by the U.S. Department of Defense (DOD), the National Science Foundation (NSF), as well as the DARPA TIDES, GALE, and BOLT programs. The latest release of the Chinese Treebank is CTB 9.0, which covers text types including newswire, magazine articles, broadcast news, broadcast conversation, newsgroups, blogs, and forums. The corpus is currently under active expansion, with more text types to be included in future releases.
Through the Chinese Proposition Bank (CPB) project, annotated semantic layers have been added to the Chinese Treebank. The latest version of the Chinese Proposition Bank is CPB 3.0, which is also distributed via the Linguistic Data Consortium (LDC).
提供机构:
OpenDataLab
创建时间:
2022-08-16
搜集汇总
数据集介绍

背景与挑战
背景概述
Chinese Treebank是一个中文树库数据集,最新版本为CTB 9.0,涵盖新闻、广播、博客等多种文本类型,用于依存句法分析和语义标注。该数据集由宾夕法尼亚大学和科罗拉多大学博尔德分校于2016年发布,支持中文自然语言处理任务,如预训练和句法分析。
以上内容由遇见数据集搜集并总结生成



