MSLR WEB30K Dataset
收藏paperswithcode.com2025-01-22 收录
下载链接:
https://paperswithcode.com/dataset/mslr-web30k
下载链接
链接失效反馈官方服务:
资源简介:
The datasets are machine learning data, in which queries and urls are represented by IDs. The datasets consist of feature vectors extracted from query-url pairs along with relevance judgment labels:
(1) The relevance judgments are obtained from a retired labeling set of a commercial web search engine (Microsoft Bing), which take 5 values from 0 (irrelevant) to 4 (perfectly relevant).
(2) The features are basically extracted by us, and are those widely used in the research community.
In the data files, each row corresponds to a query-url pair. The first column is relevance label of the pair, the second column is query id, and the following columns are features. The larger value the relevance label has, the more relevant the query-url pair is. A query-url pair is represented by a 136-dimensional feature vector.
该数据集为机器学习数据集,其中查询和URL通过ID进行表示。数据集由从查询-URL对中提取的特征向量以及相关性判断标签组成:
(1) 相关性判断来自于一家商业网络搜索引擎(Microsoft Bing)的已退役标注集,该集包含从0(无关)到4(完全相关)的5个等级。
(2) 特征主要由我们提取,且这些特征在研究界中得到广泛使用。
在数据文件中,每一行对应一个查询-URL对。第一列是这对的相关性标签,第二列是查询ID,接下来的列是特征。相关性标签的数值越大,查询-URL对的相关性越高。一个查询-URL对由一个136维度的特征向量表示。
提供机构:
Papers with Code



