five

Wikipedia Current Events Portal (WCEP) dataset

收藏
arXiv2020-05-20 更新2024-06-21 收录
下载链接:
https://github.com/complementizer/wcep-mds-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
WCEP数据集是由Aylien Ltd.和Insight Centre for Data Analytics, University College Dublin创建的,旨在解决大规模多文档摘要问题。该数据集包含10,200个新闻事件集群,每个集群平均有235篇文章,涵盖了从Wikipedia Current Events Portal提取的新闻事件摘要和相关文章。数据集的创建过程涉及从Wikipedia和Common Crawl中提取和扩展文章,以增加每个事件的文档数量。该数据集主要用于新闻聚类、搜索结果展示和时间线生成等应用领域,以支持深度学习模型的训练和评估。

The WCEP dataset was developed by Aylien Ltd. and the Insight Centre for Data Analytics at University College Dublin, with the aim of addressing the challenge of large-scale multi-document summarization. Comprising 10,200 news event clusters, each containing an average of 235 articles, the dataset encompasses news event summaries and associated articles extracted from the Wikipedia Current Events Portal. The dataset construction process involves extracting and augmenting articles from Wikipedia and Common Crawl to increase the volume of documents per event. This dataset is primarily applied in scenarios including news clustering, search result presentation, and timeline generation, to support the training and evaluation of deep learning models.
提供机构:
Aylien Ltd. 和 Insight Centre for Data Analytics, University College Dublin
创建时间:
2020-05-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作