ApposCorpus

Name: ApposCorpus
Creator: 哥本哈根大学
Published: 2020-11-06 19:23:09
License: 暂无描述

arXiv2020-11-06 更新2024-06-21 收录

下载链接：

https://yovakem.github.io/#ApposCorpus

下载链接

链接失效反馈

官方服务：

资源简介：

ApposCorpus是一个多语言、多领域的数据集，用于事实性同位语生成。该数据集由哥本哈根大学的研究团队创建，涵盖英语、西班牙语、德语和波兰语四种语言，涉及人物和组织两种实体类型，以及维基百科和新闻两个领域。数据集通过自动从维基百科收集数据并结合手动验证来创建，旨在解决跨语言和跨领域的同位语生成问题。ApposCorpus不仅支持模型训练，还提供了跨域评估的黄金标准测试集，适用于自然语言生成领域的研究和应用。

ApposCorpus is a multilingual, multi-domain dataset developed for factual appositive generation. It was created by a research team at the University of Copenhagen, covering four languages: English, Spanish, German, and Polish, two entity types including persons and organizations, and two domains: Wikipedia and news. The dataset is constructed by automatically collecting data from Wikipedia alongside manual verification, with the aim of addressing cross-lingual and cross-domain appositive generation tasks. ApposCorpus not only supports model training but also provides a gold-standard test set for cross-domain evaluation, making it suitable for research and applications in the field of natural language generation.

提供机构：

哥本哈根大学

创建时间：

2020-11-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集