five

Chinese CogBank

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2020T01
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>Chinese CogBank is a database of cognitive properties of Chinese words intended for use in metaphor understanding and generation. It consists of 232,497 "word-property" pairs, which are comprised of 83,104 words and 100,195 properties. Each "word-property" type also has an associated frequency which can stand as a functional measure of the importance of a property.</p><br> <h3>Data</h3><br> <p>The data was collected via the Chinese search engine <a href="http://www.baidu.com/">Baidu.com</a>. The original collection consisted of 1,258,430 types (5,637,500 tokens) of "word-adjective" pairs that were reduced in Chinese CogBank to 232,497 "word-property" pairs after a series of manual checks.</p><br> <p>The corpus is presented as a single tab separated value file encoded in UTF-8.</p><br> <h3>Samples</h3><br> <p>Please view this <a href="desc/addenda/LDC2020T01.txt">sample</a>.</p><br> <h3>Updates</h3><br> <p>None at this time.</p></br> Portions © 2020 Bin Li, © 2020 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作