WikiMulti
收藏arXiv2022-04-24 更新2024-06-21 收录
下载链接:
https://github.com/tikhonovpavel/wikimulti
下载链接
链接失效反馈官方服务:
资源简介:
WikiMulti是一个专为跨语言摘要任务设计的新数据集,由喀山联邦大学创建。该数据集基于15种不同语言的维基百科文章,旨在通过这些多语言内容促进跨语言摘要技术的发展。WikiMulti包含超过2万篇英文文章,平均每种其他语言对应约1万篇文章。数据集的创建过程涉及从维基百科的‘优良文章’列表中选取文章,并确保每篇文章在其他14种语言中都有对应内容。WikiMulti的应用领域主要集中在自然语言处理中的跨语言摘要技术,旨在解决不同语言间信息传递的难题。
WikiMulti is a novel dataset specifically developed for cross-lingual summarization tasks, created by Kazan Federal University. Built upon Wikipedia articles across 15 distinct languages, this dataset is designed to facilitate the advancement of cross-lingual summarization technologies using such multilingual resources. WikiMulti encompasses more than 20,000 English articles, with an average of approximately 10,000 articles for each of the remaining 14 languages. The dataset construction workflow entails selecting articles from Wikipedia's "Featured Articles" list, while guaranteeing that every selected article has corresponding content available in the other 14 languages. The primary application scope of WikiMulti lies in cross-lingual summarization technologies within the field of natural language processing, with the objective of resolving the challenges associated with information transmission across different languages.
提供机构:
喀山联邦大学
创建时间:
2022-04-24



