AOL4FOLTR
收藏arXiv2025-08-17 更新2025-11-27 收录
下载链接:
https://github.com/mg98/aol4foltr
下载链接
链接失效反馈官方服务:
资源简介:
AOL4FOLTR是一个大规模的网页搜索数据集,包含约260万个查询,来自10,000名用户。该数据集旨在解决联邦在线学习排序(FOLTR)中隐私保护和数据真实性的问题。它基于2006年发布的AOL查询日志,通过互联网档案馆恢复了超过42万个网站的内容,并重建了每个查询的前20个结果集。数据集包含查询-文档对、用户ID、时间戳、点击和非点击文档等数据,并使用103个特征进行编码。该数据集为评估同步和异步FOLTR场景提供了重要的基准。
AOL4FOLTR is a large-scale web search dataset containing approximately 2.6 million queries sourced from 10,000 users. It is designed to tackle the challenges of privacy preservation and data authenticity in Federated Online Learning to Rank (FOLTR). Built on the AOL query logs released in 2006, the dataset recovers the content of over 420,000 websites via the Internet Archive and reconstructs the top-20 result sets for each query. The dataset encompasses query-document pairs, user IDs, timestamps, clicked and non-clicked documents, and is encoded using 103 features. It serves as a critical benchmark for evaluating both synchronous and asynchronous FOLTR scenarios.
提供机构:
代尔夫特理工大学,阿姆斯特丹大学
创建时间:
2025-08-17



