CoreSearch
收藏arXiv2022-10-23 更新2024-06-21 收录
下载链接:
https://huggingface.co/datasets/Intel/CoreSearch
下载链接
链接失效反馈官方服务:
资源简介:
CoreSearch数据集是由巴伊兰大学创建的一个大规模跨文档事件共指搜索数据集,源自维基百科,利用了Wikipedia Event Coreference数据集(WEC-Eng)的标注。该数据集包含训练、验证和测试集的查询,以及一个包含约100万篇文档的大型集合用于检索。CoreSearch旨在支持研究跨文档事件共指搜索任务,解决特定事件共指信息的有效搜索和提取问题,适用于多文档摘要、多跳问答和知识库填充等应用领域。
The CoreSearch dataset is a large-scale cross-document event coreference search dataset created by Bar-Ilan University. It is sourced from Wikipedia and leverages annotations from the Wikipedia Event Coreference dataset (WEC-Eng). This dataset includes queries for training, validation, and test splits, as well as a large collection of approximately 1 million documents for retrieval. CoreSearch is designed to support research on cross-document event coreference search tasks, addressing the effective search and extraction of specific event coreference information, and is applicable to application domains such as multi-document summarization, multi-hop question answering, and knowledge base population.
提供机构:
巴伊兰大学
创建时间:
2022-10-23



