WenxingZhu/search-benchmark-dataset
收藏Hugging Face2025-12-16 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/WenxingZhu/search-benchmark-dataset
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个用于评估全文搜索引擎的基准数据集,包含来自英文维基百科的文章语料库和一组搜索查询。语料库包含5,032,104个文档,每个文档有id和text两个字段。查询部分包含903个查询,每个查询有query和tags两个字段。查询来源于AOL查询数据集,并经过过滤以去除个人信息。数据集还详细描述了查询的类型和标签,以及如何使用该数据集进行基准测试。
A benchmark dataset for evaluating full-text search engines, derived from the [search-benchmark-game](https://github.com/quickwit-oss/search-benchmark-game) project. This dataset contains a corpus of Wikipedia articles and a set of search queries designed to benchmark different search engine implementations. The corpus includes 5,032,104 documents with fields id and text. The queries include 903 queries with fields query and tags, derived from the AOL query dataset (filtered, no personal information). The dataset also details query types and tags, and how to use the dataset for benchmarking.
提供机构:
WenxingZhu



