amazon/AmazonQAC
收藏Hugging Face2024-11-19 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/amazon/AmazonQAC
下载链接
链接失效反馈官方服务:
资源简介:
AmazonQAC是一个大规模的自然查询自动补全(QAC)数据集,来源于亚马逊搜索日志。该数据集提供了用户输入的搜索前缀序列和最终搜索词,以及会话ID和时间戳等元数据,支持上下文感知的查询补全研究。数据集分为训练集和测试集,训练集包含3.95亿个样本,测试集包含2万个样本。训练集数据收集自2023年9月1日至2023年9月30日的美国日志,测试集数据收集自2023年10月1日至2023年10月14日的美国日志。数据集旨在推动QAC系统的研究,并提供了一个现实的测试集用于不同QAC方法的基准测试。
AmazonQAC is a large-scale dataset designed for Query Autocomplete (QAC) tasks, sourced from real-world Amazon Search logs. It provides anonymized sequences of user-typed prefixes leading to final search terms, along with rich session metadata such as timestamps and session IDs. This dataset supports research on context-aware query completion by offering realistic, large-scale, and natural user behavior data. QAC is a widely used feature in search engines, designed to predict users full search queries as they type. Despite its importance, research progress has been limited by the lack of realistic datasets. AmazonQAC aims to address this gap by providing a comprehensive dataset to spur advancements in QAC systems. AmazonQAC also contains a realistic test set for benchmarking of different QAC approaches, consisting of past_search, prefix and final search term rows (mimics a real QAC service).
提供机构:
amazon



