ThaiCoref

Name: ThaiCoref
Creator: 朱拉隆功大学语言学系
Published: 2024-06-10 11:47:24
License: 暂无描述

arXiv2024-06-10 更新2024-06-12 收录

下载链接：

http://www.github.com/nlp-chula/thai-coref

下载链接

链接失效反馈

官方服务：

资源简介：

ThaiCoref是由朱拉隆功大学语言学系创建的泰语指代消解数据集，包含777,271个tokens、44,082个提及和10,429个实体，涵盖大学论文、报纸、演讲和维基百科四种文本类型。该数据集基于OntoNotes基准，针对泰语特定现象进行了调整。创建过程中，通过严格的标注指南和多轮校验确保数据质量，旨在解决泰语指代消解的挑战，支持泰语自然语言处理的研究和应用。

ThaiCoref is a Thai coreference resolution dataset created by the Department of Linguistics, Chulalongkorn University. It contains 777,271 tokens, 44,082 entity mentions, and 10,429 entities, covering four text types: university theses, newspapers, speeches, and Wikipedia. Built upon the OntoNotes benchmark, this dataset has been tailored to accommodate Thai-specific linguistic phenomena. During its development, strict annotation guidelines and multi-round validation were employed to ensure data quality. It aims to address the challenges of Thai coreference resolution and support research and applications in Thai natural language processing.

提供机构：

朱拉隆功大学语言学系

创建时间：

2024-06-10

5,000+

优质数据集

54 个

任务类型

进入经典数据集