ApposCorpus
收藏arXiv2020-11-06 更新2024-06-21 收录
下载链接:
https://yovakem.github.io/#ApposCorpus
下载链接
链接失效反馈官方服务:
资源简介:
ApposCorpus是一个多语言、多领域的数据集,用于事实性同位语生成。该数据集由哥本哈根大学的研究团队创建,涵盖英语、西班牙语、德语和波兰语四种语言,涉及人物和组织两种实体类型,以及维基百科和新闻两个领域。数据集通过自动从维基百科收集数据并结合手动验证来创建,旨在解决跨语言和跨领域的同位语生成问题。ApposCorpus不仅支持模型训练,还提供了跨域评估的黄金标准测试集,适用于自然语言生成领域的研究和应用。
ApposCorpus is a multilingual, multi-domain dataset developed for factual appositive generation. It was created by a research team at the University of Copenhagen, covering four languages: English, Spanish, German, and Polish, two entity types including persons and organizations, and two domains: Wikipedia and news. The dataset is constructed by automatically collecting data from Wikipedia alongside manual verification, with the aim of addressing cross-lingual and cross-domain appositive generation tasks. ApposCorpus not only supports model training but also provides a gold-standard test set for cross-domain evaluation, making it suitable for research and applications in the field of natural language generation.
提供机构:
哥本哈根大学
创建时间:
2020-11-06



