AOL4FOLTR

Name: AOL4FOLTR
Creator: 代尔夫特理工大学,阿姆斯特丹大学
Published: 2025-08-17 20:57:54
License: 暂无描述

arXiv2025-08-17 更新2025-11-27 收录

下载链接：

https://github.com/mg98/aol4foltr

下载链接

链接失效反馈

官方服务：

资源简介：

AOL4FOLTR是一个大规模的网页搜索数据集，包含约260万个查询，来自10,000名用户。该数据集旨在解决联邦在线学习排序（FOLTR）中隐私保护和数据真实性的问题。它基于2006年发布的AOL查询日志，通过互联网档案馆恢复了超过42万个网站的内容，并重建了每个查询的前20个结果集。数据集包含查询-文档对、用户ID、时间戳、点击和非点击文档等数据，并使用103个特征进行编码。该数据集为评估同步和异步FOLTR场景提供了重要的基准。

AOL4FOLTR is a large-scale web search dataset containing approximately 2.6 million queries sourced from 10,000 users. It is designed to tackle the challenges of privacy preservation and data authenticity in Federated Online Learning to Rank (FOLTR). Built on the AOL query logs released in 2006, the dataset recovers the content of over 420,000 websites via the Internet Archive and reconstructs the top-20 result sets for each query. The dataset encompasses query-document pairs, user IDs, timestamps, clicked and non-clicked documents, and is encoded using 103 features. It serves as a critical benchmark for evaluating both synchronous and asynchronous FOLTR scenarios.

提供机构：

代尔夫特理工大学,阿姆斯特丹大学

创建时间：

2025-08-17

5,000+

优质数据集

54 个

任务类型

进入经典数据集