MSLR WEB30K (Microsoft Learning to Rank Datasets-30k)
收藏OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/MSLR_WEB30K
下载链接
链接失效反馈官方服务:
资源简介:
数据集是机器学习数据,其中查询和 URL 由 ID 表示。数据集由从 query-url 对中提取的特征向量以及相关性判断标签组成:(1)相关性判断来自商业网络搜索引擎(Microsoft Bing)的退役标签集,从 0 中取 5 个值(不相关) 到 4(完全相关)。 (2) 特征基本上是我们自己提取的,是研究界广泛使用的。在数据文件中,每一行对应一个 query-url 对。第一列是pair的相关标签,第二列是query id,以下列是特征。相关性标签的值越大,查询-url 对的相关性越高。一个 query-url 对由一个 136 维的特征向量表示。
This is a machine learning dataset where queries and URLs are represented by IDs. The dataset comprises feature vectors extracted from query-url pairs and relevance judgment labels: (1) The relevance judgments are sourced from a retired label set of the commercial web search engine Microsoft Bing, with 5 values ranging from 0 (irrelevant) to 4 (fully relevant). (2) The features, which are widely used by the research community, were mostly extracted by our team. In the data file, each row corresponds to one query-url pair. The first column is the relevance label of the pair, the second column is the query ID, and the subsequent columns are the features. The higher the value of the relevance label, the higher the relevance of the corresponding query-url pair. Each query-url pair is represented by a 136-dimensional feature vector.
提供机构:
OpenDataLab
创建时间:
2022-08-11
搜集汇总
数据集介绍

背景与挑战
背景概述
MSLR WEB30K是微软发布的机器学习排序数据集,包含查询和URL的136维特征向量及0-4的相关性标签,适用于文档排序研究。数据来源于Microsoft Bing的退役标签集,具有明确的学术引用和发布信息。
以上内容由遇见数据集搜集并总结生成



