sentence-transformers/msmarco-scores-ms-marco-MiniLM-L6-v2

Name: sentence-transformers/msmarco-scores-ms-marco-MiniLM-L6-v2
Creator: sentence-transformers
Published: 2025-06-24 12:59:20
License: 暂无描述

Hugging Face2025-06-24 更新2025-07-05 收录

下载链接：

https://hf-mirror.com/datasets/sentence-transformers/msmarco-scores-ms-marco-MiniLM-L6-v2

下载链接

链接失效反馈

官方服务：

资源简介：

MS MARCO是一个基于真实用户搜索查询创建的大规模信息检索语料库。这个数据集包含了160 million CrossEncoder分数，使用的是cross-encoder/ms-marco-MiniLM-L6-v2模型。分数是未处理的logits，即它们不介于0...1之间，可以用于使用蒸馏进行微调搜索模型。此外，还有MS MARCO Mined Triplets集合，其中包含了使用13种不同的嵌入模型挖掘的triplets，可能使用这个数据集进行过滤以避免假阴性。

MS MARCO is a large scale information retrieval corpus that was created based on real user search queries using the Bing search engine. This dataset contains 160 million CrossEncoder scores on the MS MARCO dataset, using the [cross-encoder/ms-marco-MiniLM-L6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2) model. The scores are unprocessed logits, i.e. they dont range between 0...1, and they can be used for finetuning search models using distillation. See also the [MS MARCO Mined Triplets collection](https://huggingface.co/collections/sentence-transformers/ms-marco-mined-triplets-6644d6f1ff58c5103fe65f23) for triplets mined using 13 different embedding models, perhaps filtered using this dataset to avoid false negatives.

提供机构：

sentence-transformers

5,000+

优质数据集

54 个

任务类型

进入经典数据集