Chinese SNACS Corpus
收藏arXiv2020-03-19 更新2024-06-21 收录
下载链接:
https://github.com/nert-nlp/Chinese-SNACS/
下载链接
链接失效反馈官方服务:
资源简介:
Chinese SNACS Corpus是首个针对汉语中所有介词进行语义标注的数据集,由乔治城大学语言学系创建。该数据集基于《小王子》的中文翻译,共包含20,287个词汇,其中933个为介词。数据集的创建过程中,研究者们采用了SNACS(Semantic Network of Adposition and Case Supersenses)框架,对介词进行了详细的语义分类。该数据集主要应用于自然语言处理领域,如机器翻译和语法错误校正,旨在解决跨语言介词语义变异的问题。
The Chinese SNACS Corpus is the first dataset dedicated to semantic annotation of all prepositions in Chinese, created by the Department of Linguistics at Georgetown University. Derived from the Chinese translation of *The Little Prince*, this dataset contains a total of 20,287 tokens, 933 of which are prepositions. During the development of this dataset, researchers adopted the SNACS (Semantic Network of Adposition and Case Supersenses) framework to conduct detailed semantic classification of prepositions. This dataset is primarily applied in the field of natural language processing, such as machine translation and grammatical error correction, with the aim of resolving cross-linguistic semantic variation in prepositions.
提供机构:
乔治城大学语言学系
创建时间:
2020-03-19



