MSLR WEB30K (Microsoft Learning to Rank Datasets-30k)

Name: MSLR WEB30K (Microsoft Learning to Rank Datasets-30k)
Creator: OpenDataLab
Published: 2026-05-24 09:30:21
License: 暂无描述

OpenDataLab2026-05-24 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/MSLR_WEB30K

下载链接

链接失效反馈

官方服务：

资源简介：

数据集是机器学习数据，其中查询和 URL 由 ID 表示。数据集由从 query-url 对中提取的特征向量以及相关性判断标签组成：（1）相关性判断来自商业网络搜索引擎（Microsoft Bing）的退役标签集，从 0 中取 5 个值（不相关) 到 4（完全相关）。 (2) 特征基本上是我们自己提取的，是研究界广泛使用的。在数据文件中，每一行对应一个 query-url 对。第一列是pair的相关标签，第二列是query id，以下列是特征。相关性标签的值越大，查询-url 对的相关性越高。一个 query-url 对由一个 136 维的特征向量表示。

This is a machine learning dataset where queries and URLs are represented by IDs. The dataset comprises feature vectors extracted from query-url pairs and relevance judgment labels: (1) The relevance judgments are sourced from a retired label set of the commercial web search engine Microsoft Bing, with 5 values ranging from 0 (irrelevant) to 4 (fully relevant). (2) The features, which are widely used by the research community, were mostly extracted by our team. In the data file, each row corresponds to one query-url pair. The first column is the relevance label of the pair, the second column is the query ID, and the subsequent columns are the features. The higher the value of the relevance label, the higher the relevance of the corresponding query-url pair. Each query-url pair is represented by a 136-dimensional feature vector.

提供机构：

OpenDataLab

创建时间：

2022-08-11

搜集汇总

数据集介绍