five

msmarco-scores-ms-marco-MiniLM-L6-v2

收藏
魔搭社区2026-01-06 更新2025-06-28 收录
下载链接:
https://modelscope.cn/datasets/sentence-transformers/msmarco-scores-ms-marco-MiniLM-L6-v2
下载链接
链接失效反馈
官方服务:
资源简介:
# MS MARCO query-passage scores using [cross-encoder/ms-marco-MiniLM-L6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2) [MS MARCO](https://microsoft.github.io/msmarco/) is a large scale information retrieval corpus that was created based on real user search queries using the Bing search engine. This dataset contains 160 million CrossEncoder scores on the MS MARCO dataset, using the [cross-encoder/ms-marco-MiniLM-L6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2) model. The scores are unprocessed logits, i.e. they don't range between 0...1, and they can be used for finetuning search models using distillation. See also the [MS MARCO Mined Triplets collection](https://huggingface.co/collections/sentence-transformers/ms-marco-mined-triplets-6644d6f1ff58c5103fe65f23) for triplets mined using 13 different embedding models, perhaps filtered using this dataset to avoid false negatives. ## Dataset Subsets ### `pair` subset * Columns: "query_id", "corpus_id", "score" * Column types: `int`, `int`, `float` * Examples: ```python { "query_id": 571018, "corpus_id": 7349777, "score": 10.257502, } ``` * Collection Strategy: Mining negatives using an embedding model, and rescoring the query/answer pairs. This subset contains all scored query-passage pairs. ### `triplet` subset * Columns: "query_id", "positive_id", "negative_id", "score" * Column types: `int`, `int`, `list[float]` * Examples: ```python { "query_id": 571018, "positive_id": 1283525, "negative_id": 7349777, "score": [3.421323299407959, 10.257501602172852], } ``` * Collection Strategy: Mining negatives using an embedding model, and rescoring the query/answer pairs. Randomly subdivide all query/answer pairs for each query into two equally sized groups, the first group representing positives and the second group negatives. The score is a list for query-positive and query-negative scores. Note that `"positive_id"` is not necessarily a positive, and `"negative_id"` is not necessarily a negative. The scores indicate the similarity. ### `list` subset * Columns: "query_id", "corpus_id", "score" * Column types: `int`, `list[int]`, `list[float]` * Examples: ```python { "query_id": 571018, "corpus_id": [7349777, 6948601, 5129919, 6717931, 1065943, 1626276, 981824, 6449111, 1028927, 2524942, 5810175, 6236527, 7179545, 168979, 150383, 168983, 7027047, 3559703, 8768336, 5476579, 915244, 2202253, 1743842, 7727041, 1036624, 8432142, 2236979, 724018, 7179544, 7349780, 7179539, 6072080, 7790852, 4873670, 4389296, 2305477, 1626275, 291845, 1743847, 1508485, 4298457, 1831337, 1760417, 8768340, 8432143, 1971355, 1133925, 2105819, 168975, 5132446, 1316646, 1065945, 7349776, 6717930, 2305472, 8768339, 8768341, 6717927, 7179547, 7491026, 4903324, 1516443, 1065951, 6717926, 4779313, 1381778, 7349774, 6717928, 7349778, 1692036, 168976, 7004464, 5129916, 6243357, 1682970, 2174051, 2735530, 7097201, 4316878, 1508484, 1951254, 2740235, 7790853, 6893978, 4816777, 1191538, 7027046, 165888, 7027044, 1833474, 1065944, 7027050, 7790847, 6717925, 3911285, 7862900, 1065947, 1279944, 8818003, 2174049, 7179546, 978303, 1629126, 4359059, 1891131, 7032037, 8674123, 269779, 371192, 5423524, 150253, 8768342, 8567249, 1833477, 22448, 7862904, 4298455, 7448360, 8768334, 8417201, 2305474, 1283525, 4377211, 7790851, 6243359, 1065948, 7491025, 5437, 1891129, 168974, 7491029, 5129920, 6717924, 468898, 1065952, 2305471, 4903323, 4316880, 8768338, 8184295, 1065946], "score": [10.257501602172852, 3.5812907218933105, 8.257364273071289, 8.866464614868164, 5.258519172668457, 4.193713188171387, 8.563857078552246, 4.907355785369873, 7.617893695831299, 1.5268436670303345, -0.6152520179748535, -2.9456772804260254, 10.018341064453125, 10.202350616455078, 1.7948371171951294, 8.693719863891602, 4.469407081604004, 0.4720204472541809, -1.00309157371521, 1.8172178268432617, 1.7467658519744873, -1.4857474565505981, 4.076294422149658, -1.777407169342041, -0.7370984554290771, 4.278080463409424, -1.0950472354888916, 2.5531094074249268, 10.004817008972168, 10.176589965820312, 8.594615936279297, 1.8897120952606201, 7.299615859985352, 8.61693000793457, 0.10016749799251556, 6.883630752563477, 10.320749282836914, 0.7852426171302795, 2.7261080741882324, 2.0838329792022705, 1.8327460289001465, -0.6380012035369873, 0.926922082901001, 4.037473201751709, 2.498434543609619, 1.148393154144287, 0.13919004797935486, 3.2860398292541504, 10.097441673278809, 2.575753688812256, 1.7576978206634521, 10.210726737976074, 9.687068939208984, 8.633060455322266, 7.698808193206787, 8.400606155395508, 6.934174060821533, 6.636131286621094, 7.153725624084473, 7.550482273101807, 6.602751731872559, 5.704792022705078, 5.986926555633545, 4.501513481140137, 5.526628017425537, 4.542888164520264, 7.835060119628906, 6.569742202758789, 4.466667175292969, -0.17466464638710022, 5.990896701812744, 4.383068084716797, 5.085425853729248, 6.489709854125977, 3.4293251037597656, 4.946746826171875, 5.910137176513672, 5.161900520324707, 1.4832103252410889, 4.817190170288086, 3.958622694015503, 1.8736721277236938, 6.366949081420898, 4.05584716796875, 4.808823585510254, 2.6205739974975586, 3.1121416091918945, 4.710823059082031, 3.8835949897766113, 7.4977803230285645, 7.494195938110352, 7.061235427856445, 7.726348876953125, 5.88195276260376, 4.692541122436523, 3.5332908630371094, 4.462759017944336, -4.2239532470703125, -2.5660009384155273, 6.035939693450928, 5.124550819396973, 7.7053680419921875, -2.0111143589019775, -0.8396145105361938, 7.110054969787598, 3.5912249088287354, -0.17257803678512573, 0.5216665267944336, 3.553079128265381, -5.091264724731445, 3.0037851333618164, 6.739882469177246, 1.2511098384857178, 5.766049385070801, -4.700324058532715, 2.5425989627838135, 1.9228131771087646, -1.8639280796051025, 3.136643886566162, 0.423944354057312, 2.2642807960510254, 3.421323299407959, -3.7783584594726562, 8.579450607299805, 9.007328987121582, 6.923714637756348, 6.49301290512085, 6.645390033721924, 3.557553291320801, 6.487471580505371, 4.421983242034912, 4.1287055015563965, 6.218915939331055, 6.673498153686523, 4.962984085083008, 4.784761428833008, 3.790182590484619, 3.781992197036743, 5.345108509063721, 4.898831367492676, 4.420433044433594], } ``` * Collection Strategy: Mining negatives using an embedding model, and rescoring the query/answer pairs. This is all data, grouped per query_id.
提供机构:
maas
创建时间:
2025-06-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作