KoCoNovel
收藏arXiv2024-04-11 更新2024-06-21 收录
下载链接:
https://github.com/storidient/KoCoNovel.git
下载链接
链接失效反馈官方服务:
资源简介:
KoCoNovel是由首尔国立大学开发的一个大型韩语文学文本中的角色共指数据集,包含来自50部现代和当代韩国小说的178,000个词汇。该数据集是首个基于文学文本的韩语共指解析语料库,特别关注韩语中的称呼文化,其中24%的角色提及是单一的普通名词。KoCoNovel提供四个不同版本,以适应广泛的文学共指分析需求,支持全知作者或读者的视角,并处理多个实体作为单独或重叠实体,从而扩大其适用性。数据集的创建过程涉及详细的预处理和标注,旨在通过整合韩国文化和语言动态,显著提升共指解析模型的性能。
KoCoNovel is a large-scale Korean character coreference resolution dataset curated from literary texts, developed by Seoul National University. It encompasses 178,000 tokens derived from 50 modern and contemporary Korean novels. As the first Korean coreference resolution corpus built on literary texts, it specifically focuses on the honorific address culture of the Korean language, where 24% of character mentions consist of single common nouns. KoCoNovel provides four distinct variants to cater to a broad spectrum of literary coreference analysis needs, supporting both omniscient authorial and reader viewpoints, and accommodating multiple entities as either separate or overlapping entities, thus expanding its applicability. The dataset development process entails elaborate preprocessing and annotation, with the goal of significantly improving the performance of coreference resolution models by integrating the cultural and linguistic nuances of the Korean language.
提供机构:
首尔国立大学
创建时间:
2024-04-01



