WikiAsp
收藏arXiv2020-11-16 更新2024-06-21 收录
下载链接:
http://github.com/neulab/wikiasp
下载链接
链接失效反馈官方服务:
资源简介:
WikiAsp是一个大规模的多领域基于方面的摘要数据集,旨在推动开放领域基于方面的摘要研究。该数据集利用维基百科文章,从20个不同领域中提取,使用文章的章节标题和边界作为方面注释的代理。数据集创建过程中,通过自动提取、筛选和过滤步骤构建了“方面”集合。WikiAsp数据集的应用领域广泛,旨在解决多文档和多领域环境下的摘要生成挑战,如正确处理引用的来源代词和一致解释时间敏感事件等。
WikiAsp is a large-scale multi-domain aspect-based summarization dataset designed to advance open-domain aspect-based summarization research. This dataset is constructed using Wikipedia articles extracted from 20 distinct domains, taking the section titles and boundaries of the articles as proxies for aspect annotations. During the dataset creation process, the set of aspects was built through automatic extraction, screening and filtering steps. The WikiAsp dataset has broad applicability, aiming to address the challenges of summarization generation in multi-document and multi-domain environments, such as properly handling pronouns associated with referenced sources and consistently interpreting time-sensitive events.
提供机构:
语言技术研究所
创建时间:
2020-11-16



