five

马尔马拉土耳其指代消解语料库

收藏
arXiv2018-07-31 更新2024-06-21 收录
下载链接:
https://bitbucket.org/knowlp/marmara-turkish-coreference-corpus
下载链接
链接失效反馈
官方服务:
资源简介:
马尔马拉土耳其指代消解语料库是由马尔马拉大学工程学院创建的,基于METU-Sabanci土耳其树库的标注数据集,包含提及和指代链。该数据集通过收集每份文档的八次以上独立标注,实现了完全自动化的裁决。数据集包含5170个提及和944个指代链,主要用于开发和评估土耳其语的自动指代消解方法。土耳其语的指代消解面临多种挑战,如语法性别的缺失、主语和宾语位置的空代词、作为后缀表达的所有格代词以及所有格和数形态的歧义。该数据集的应用领域旨在解决这些挑战,提高土耳其语指代消解的准确性和效率。

The Marmara Turkish Coreference Resolution Corpus is an annotated dataset created by the School of Engineering at Marmara University, based on the METU-Sabanci Turkish Treebank, which contains mentions and coreference chains. This dataset achieves fully automated adjudication by collecting more than eight independent annotations for each document. It includes 5,170 mentions and 944 coreference chains, and is primarily used for developing and evaluating automatic coreference resolution methods for Turkish. Coreference resolution in Turkish faces multiple challenges, such as the lack of grammatical gender, null pronouns in subject and object positions, possessive pronouns expressed as suffixes, and ambiguity in possessive and number morphology. The application scenarios of this dataset aim to address these challenges and enhance the accuracy and efficiency of Turkish coreference resolution.
提供机构:
马尔马拉大学工程学院
创建时间:
2017-06-07
二维码
社区交流群
二维码
科研交流群
商业服务