hotchpotch/mmarco-hard-negatives-reranker-filtered
收藏Hugging Face2026-01-12 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/hotchpotch/mmarco-hard-negatives-reranker-filtered
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: arabic-hard-negatives
features:
- name: query
dtype: string
- name: pos_text
dtype: string
- name: negs_text
list: string
- name: negs_count
dtype: int32
- name: pos_score
dtype: float32
- name: negs_score
list: float32
splits:
- name: train
num_bytes: 2113494813
num_examples: 349518
download_size: 989078789
dataset_size: 2113494813
- config_name: arabic-hard-negatives-7
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative_1
dtype: string
- name: negative_2
dtype: string
- name: negative_3
dtype: string
- name: negative_4
dtype: string
- name: negative_5
dtype: string
- name: negative_6
dtype: string
- name: negative_7
dtype: string
splits:
- name: train
num_bytes: 1292089603
num_examples: 299044
download_size: 638550242
dataset_size: 1292089603
- config_name: arabic-triplet
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 400217378
num_examples: 349518
download_size: 200344021
dataset_size: 400217378
- config_name: arabic-triplet-10
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 3464493625
num_examples: 3031778
download_size: 943959375
dataset_size: 3464493625
- config_name: arabic-triplet-all
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 4047699539
num_examples: 3546380
download_size: 1073051129
dataset_size: 4047699539
- config_name: chinese-hard-negatives
features:
- name: query
dtype: string
- name: pos_text
dtype: string
- name: negs_text
list: string
- name: negs_count
dtype: int32
- name: pos_score
dtype: float32
- name: negs_score
list: float32
splits:
- name: train
num_bytes: 2216454702
num_examples: 383313
download_size: 1359075674
dataset_size: 2216454702
- config_name: chinese-hard-negatives-7
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative_1
dtype: string
- name: negative_2
dtype: string
- name: negative_3
dtype: string
- name: negative_4
dtype: string
- name: negative_5
dtype: string
- name: negative_6
dtype: string
- name: negative_7
dtype: string
splits:
- name: train
num_bytes: 927271103
num_examples: 370984
download_size: 618463240
dataset_size: 927271103
- config_name: chinese-triplet
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 252510559
num_examples: 383313
download_size: 171058848
dataset_size: 252510559
- config_name: chinese-triplet-10
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 2455032272
num_examples: 3729432
download_size: 863389567
dataset_size: 2455032272
- config_name: chinese-triplet-all
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 4395417304
num_examples: 6683870
download_size: 1422492995
dataset_size: 4395417304
- config_name: dutch-hard-negatives
features:
- name: query
dtype: string
- name: pos_text
dtype: string
- name: negs_text
list: string
- name: negs_count
dtype: int32
- name: pos_score
dtype: float32
- name: negs_score
list: float32
splits:
- name: train
num_bytes: 2174002796
num_examples: 371879
download_size: 1212093655
dataset_size: 2174002796
- config_name: dutch-hard-negatives-7
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative_1
dtype: string
- name: negative_2
dtype: string
- name: negative_3
dtype: string
- name: negative_4
dtype: string
- name: negative_5
dtype: string
- name: negative_6
dtype: string
- name: negative_7
dtype: string
splits:
- name: train
num_bytes: 1091922772
num_examples: 354231
download_size: 652686790
dataset_size: 1091922772
- config_name: dutch-triplet
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 303033800
num_examples: 371879
download_size: 183795674
dataset_size: 303033800
- config_name: dutch-triplet-10
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 2891961008
num_examples: 3551107
download_size: 923122947
dataset_size: 2891961008
- config_name: dutch-triplet-all
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 4274542464
num_examples: 5258282
download_size: 1287016546
dataset_size: 4274542464
- config_name: english-hard-negatives
features:
- name: query
dtype: string
- name: pos_text
dtype: string
- name: negs_text
list: string
- name: negs_count
dtype: int32
- name: pos_score
dtype: float32
- name: negs_score
list: float32
splits:
- name: train
num_bytes: 2324505943
num_examples: 399075
download_size: 1306880603
dataset_size: 2324505943
- config_name: english-hard-negatives-7
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative_1
dtype: string
- name: negative_2
dtype: string
- name: negative_3
dtype: string
- name: negative_4
dtype: string
- name: negative_5
dtype: string
- name: negative_6
dtype: string
- name: negative_7
dtype: string
splits:
- name: train
num_bytes: 1081053381
num_examples: 383872
download_size: 655650453
dataset_size: 1081053381
- config_name: english-triplet
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 296175314
num_examples: 399075
download_size: 182842216
dataset_size: 296175314
- config_name: english-triplet-10
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 2857984730
num_examples: 3852858
download_size: 923091822
dataset_size: 2857984730
- config_name: english-triplet-all
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 4580123031
num_examples: 6185133
download_size: 1378658598
dataset_size: 4580123031
- config_name: french-hard-negatives
features:
- name: query
dtype: string
- name: pos_text
dtype: string
- name: negs_text
list: string
- name: negs_count
dtype: int32
- name: pos_score
dtype: float32
- name: negs_score
list: float32
splits:
- name: train
num_bytes: 2190245299
num_examples: 375562
download_size: 1184166311
dataset_size: 2190245299
- config_name: french-hard-negatives-7
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative_1
dtype: string
- name: negative_2
dtype: string
- name: negative_3
dtype: string
- name: negative_4
dtype: string
- name: negative_5
dtype: string
- name: negative_6
dtype: string
- name: negative_7
dtype: string
splits:
- name: train
num_bytes: 1180087427
num_examples: 351278
download_size: 683217441
dataset_size: 1180087427
- config_name: french-triplet
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 333976677
num_examples: 375562
download_size: 196193611
dataset_size: 333976677
- config_name: french-triplet-10
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 3127811745
num_examples: 3521407
download_size: 968642087
dataset_size: 3127811745
- config_name: french-triplet-all
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 4280184877
num_examples: 4827864
download_size: 1264534665
dataset_size: 4280184877
- config_name: german-hard-negatives
features:
- name: query
dtype: string
- name: pos_text
dtype: string
- name: negs_text
list: string
- name: negs_count
dtype: int32
- name: pos_score
dtype: float32
- name: negs_score
list: float32
splits:
- name: train
num_bytes: 2130821555
num_examples: 362195
download_size: 1191049599
dataset_size: 2130821555
- config_name: german-hard-negatives-7
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative_1
dtype: string
- name: negative_2
dtype: string
- name: negative_3
dtype: string
- name: negative_4
dtype: string
- name: negative_5
dtype: string
- name: negative_6
dtype: string
- name: negative_7
dtype: string
splits:
- name: train
num_bytes: 1098128571
num_examples: 343891
download_size: 658325690
dataset_size: 1098128571
- config_name: german-triplet
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 305643548
num_examples: 362195
download_size: 186057867
dataset_size: 305643548
- config_name: german-triplet-10
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 2904631870
num_examples: 3443904
download_size: 929962814
dataset_size: 2904631870
- config_name: german-triplet-all
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 4180552773
num_examples: 4964819
download_size: 1265443454
dataset_size: 4180552773
- config_name: indonesian-hard-negatives
features:
- name: query
dtype: string
- name: pos_text
dtype: string
- name: negs_text
list: string
- name: negs_count
dtype: int32
- name: pos_score
dtype: float32
- name: negs_score
list: float32
splits:
- name: train
num_bytes: 2167660896
num_examples: 373869
download_size: 1143622995
dataset_size: 2167660896
- config_name: indonesian-hard-negatives-7
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative_1
dtype: string
- name: negative_2
dtype: string
- name: negative_3
dtype: string
- name: negative_4
dtype: string
- name: negative_5
dtype: string
- name: negative_6
dtype: string
- name: negative_7
dtype: string
splits:
- name: train
num_bytes: 1070929143
num_examples: 356143
download_size: 608417256
dataset_size: 1070929143
- config_name: indonesian-triplet
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 297191098
num_examples: 373869
download_size: 171494207
dataset_size: 297191098
- config_name: indonesian-triplet-10
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 2839015603
num_examples: 3573699
download_size: 861180271
dataset_size: 2839015603
- config_name: indonesian-triplet-all
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 4267409210
num_examples: 5380840
download_size: 1217482223
dataset_size: 4267409210
- config_name: italian-hard-negatives
features:
- name: query
dtype: string
- name: pos_text
dtype: string
- name: negs_text
list: string
- name: negs_count
dtype: int32
- name: pos_score
dtype: float32
- name: negs_score
list: float32
splits:
- name: train
num_bytes: 2167846787
num_examples: 373979
download_size: 1204883037
dataset_size: 2167846787
- config_name: italian-hard-negatives-7
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative_1
dtype: string
- name: negative_2
dtype: string
- name: negative_3
dtype: string
- name: negative_4
dtype: string
- name: negative_5
dtype: string
- name: negative_6
dtype: string
- name: negative_7
dtype: string
splits:
- name: train
num_bytes: 1117766793
num_examples: 353540
download_size: 666621653
dataset_size: 1117766793
- config_name: italian-triplet
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 312908464
num_examples: 373979
download_size: 189495198
dataset_size: 312908464
- config_name: italian-triplet-10
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 2964847787
num_examples: 3545498
download_size: 943341257
dataset_size: 2964847787
- config_name: italian-triplet-all
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 4256926173
num_examples: 5099619
download_size: 1280973836
dataset_size: 4256926173
- config_name: japanese-hard-negatives
features:
- name: query
dtype: string
- name: pos_text
dtype: string
- name: negs_text
list: string
- name: negs_count
dtype: int32
- name: pos_score
dtype: float32
- name: negs_score
list: float32
splits:
- name: train
num_bytes: 2144415650
num_examples: 357351
download_size: 1080761827
dataset_size: 2144415650
- config_name: japanese-hard-negatives-7
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative_1
dtype: string
- name: negative_2
dtype: string
- name: negative_3
dtype: string
- name: negative_4
dtype: string
- name: negative_5
dtype: string
- name: negative_6
dtype: string
- name: negative_7
dtype: string
splits:
- name: train
num_bytes: 1203518881
num_examples: 331773
download_size: 648812107
dataset_size: 1203518881
- config_name: japanese-triplet
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 341285758
num_examples: 357351
download_size: 187201236
dataset_size: 341285758
- config_name: japanese-triplet-10
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 3168329044
num_examples: 3317556
download_size: 922432699
dataset_size: 3168329044
- config_name: japanese-triplet-all
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 4156658204
num_examples: 4354539
download_size: 1159057487
dataset_size: 4156658204
- config_name: spanish-hard-negatives
features:
- name: query
dtype: string
- name: pos_text
dtype: string
- name: negs_text
list: string
- name: negs_count
dtype: int32
- name: pos_score
dtype: float32
- name: negs_score
list: float32
splits:
- name: train
num_bytes: 2200508708
num_examples: 381323
download_size: 1188936798
dataset_size: 2200508708
- config_name: spanish-hard-negatives-7
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative_1
dtype: string
- name: negative_2
dtype: string
- name: negative_3
dtype: string
- name: negative_4
dtype: string
- name: negative_5
dtype: string
- name: negative_6
dtype: string
- name: negative_7
dtype: string
splits:
- name: train
num_bytes: 1167774418
num_examples: 356969
download_size: 676069637
dataset_size: 1167774418
- config_name: spanish-triplet
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 330337739
num_examples: 381323
download_size: 194281213
dataset_size: 330337739
- config_name: spanish-triplet-10
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 3099212505
num_examples: 3581475
download_size: 958736780
dataset_size: 3099212505
- config_name: spanish-triplet-all
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 4307501828
num_examples: 4987393
download_size: 1268895341
dataset_size: 4307501828
configs:
- config_name: arabic-hard-negatives
data_files:
- split: train
path: arabic-hard-negatives/train-*
- config_name: arabic-hard-negatives-7
data_files:
- split: train
path: arabic-hard-negatives-7/train-*
- config_name: arabic-triplet
data_files:
- split: train
path: arabic-triplet/train-*
- config_name: arabic-triplet-10
data_files:
- split: train
path: arabic-triplet-10/train-*
- config_name: arabic-triplet-all
data_files:
- split: train
path: arabic-triplet-all/train-*
- config_name: chinese-hard-negatives
data_files:
- split: train
path: chinese-hard-negatives/train-*
- config_name: chinese-hard-negatives-7
data_files:
- split: train
path: chinese-hard-negatives-7/train-*
- config_name: chinese-triplet
data_files:
- split: train
path: chinese-triplet/train-*
- config_name: chinese-triplet-10
data_files:
- split: train
path: chinese-triplet-10/train-*
- config_name: chinese-triplet-all
data_files:
- split: train
path: chinese-triplet-all/train-*
- config_name: dutch-hard-negatives
data_files:
- split: train
path: dutch-hard-negatives/train-*
- config_name: dutch-hard-negatives-7
data_files:
- split: train
path: dutch-hard-negatives-7/train-*
- config_name: dutch-triplet
data_files:
- split: train
path: dutch-triplet/train-*
- config_name: dutch-triplet-10
data_files:
- split: train
path: dutch-triplet-10/train-*
- config_name: dutch-triplet-all
data_files:
- split: train
path: dutch-triplet-all/train-*
- config_name: english-hard-negatives
data_files:
- split: train
path: english-hard-negatives/train-*
- config_name: english-hard-negatives-7
data_files:
- split: train
path: english-hard-negatives-7/train-*
- config_name: english-triplet
data_files:
- split: train
path: english-triplet/train-*
- config_name: english-triplet-10
data_files:
- split: train
path: english-triplet-10/train-*
- config_name: english-triplet-all
data_files:
- split: train
path: english-triplet-all/train-*
- config_name: french-hard-negatives
data_files:
- split: train
path: french-hard-negatives/train-*
- config_name: french-hard-negatives-7
data_files:
- split: train
path: french-hard-negatives-7/train-*
- config_name: french-triplet
data_files:
- split: train
path: french-triplet/train-*
- config_name: french-triplet-10
data_files:
- split: train
path: french-triplet-10/train-*
- config_name: french-triplet-all
data_files:
- split: train
path: french-triplet-all/train-*
- config_name: german-hard-negatives
data_files:
- split: train
path: german-hard-negatives/train-*
- config_name: german-hard-negatives-7
data_files:
- split: train
path: german-hard-negatives-7/train-*
- config_name: german-triplet
data_files:
- split: train
path: german-triplet/train-*
- config_name: german-triplet-10
data_files:
- split: train
path: german-triplet-10/train-*
- config_name: german-triplet-all
data_files:
- split: train
path: german-triplet-all/train-*
- config_name: indonesian-hard-negatives
data_files:
- split: train
path: indonesian-hard-negatives/train-*
- config_name: indonesian-hard-negatives-7
data_files:
- split: train
path: indonesian-hard-negatives-7/train-*
- config_name: indonesian-triplet
data_files:
- split: train
path: indonesian-triplet/train-*
- config_name: indonesian-triplet-10
data_files:
- split: train
path: indonesian-triplet-10/train-*
- config_name: indonesian-triplet-all
data_files:
- split: train
path: indonesian-triplet-all/train-*
- config_name: italian-hard-negatives
data_files:
- split: train
path: italian-hard-negatives/train-*
- config_name: italian-hard-negatives-7
data_files:
- split: train
path: italian-hard-negatives-7/train-*
- config_name: italian-triplet
data_files:
- split: train
path: italian-triplet/train-*
- config_name: italian-triplet-10
data_files:
- split: train
path: italian-triplet-10/train-*
- config_name: italian-triplet-all
data_files:
- split: train
path: italian-triplet-all/train-*
- config_name: japanese-hard-negatives
data_files:
- split: train
path: japanese-hard-negatives/train-*
- config_name: japanese-hard-negatives-7
data_files:
- split: train
path: japanese-hard-negatives-7/train-*
- config_name: japanese-triplet
data_files:
- split: train
path: japanese-triplet/train-*
- config_name: japanese-triplet-10
data_files:
- split: train
path: japanese-triplet-10/train-*
- config_name: japanese-triplet-all
data_files:
- split: train
path: japanese-triplet-all/train-*
- config_name: spanish-hard-negatives
data_files:
- split: train
path: spanish-hard-negatives/train-*
- config_name: spanish-hard-negatives-7
data_files:
- split: train
path: spanish-hard-negatives-7/train-*
- config_name: spanish-triplet
data_files:
- split: train
path: spanish-triplet/train-*
- config_name: spanish-triplet-10
data_files:
- split: train
path: spanish-triplet-10/train-*
- config_name: spanish-triplet-all
data_files:
- split: train
path: spanish-triplet-all/train-*
---
# mMARCO Reranker-Filtered Hard Negatives (Multilingual)
## Overview
This dataset is built from [mMARCO](https://huggingface.co/datasets/unicamp-dl/mmarco) (multilingual MS MARCO) triplets for each language subset. For each (query, positive), hard negatives are bundled and then filtered using cross-encoder re-scoring. The goal is to remove negatives that are too strong or incorrect for training. The same procedure is applied to all language subsets.
The dataset is published as `mmarco-hard-negatives-reranker-filtered` with config names `{lang}-{variant}`.
`{lang}` is the language subset name (e.g., `japanese`), and `{variant}` is one of the following. The pair format is not included in the public release.
### 1) `{lang}-hard-negatives`
The filtered hard negatives as-is.
Columns:
`query: str`, `pos_text: str`, `negs_text: list[str]`, `negs_count: int`, `pos_score: float`, `negs_score: list[float]`
### 2) `{lang}-triplet`
For each `(query, pos_text)`, one negative is randomly selected and converted into a `(query, positive, negative)` triplet.
Columns:
`query: str`, `positive: str`, `negative: str`
### 3) `{lang}-triplet-10`
For each `(query, pos_text)`, up to 10 negatives are randomly sampled, and each is expanded into a `(query, positive, negative)` triplet.
Columns:
`query: str`, `positive: str`, `negative: str`
### 4) `{lang}-triplet-all`
All negatives in `negs_text` are expanded into `(query, positive, negative)` triplets.
Columns:
`query: str`, `positive: str`, `negative: str`
### 5) `{lang}-hard-negatives-7`
Only records with at least 7 negatives are kept. Then 7 negatives are randomly selected and stored as `negative_1..negative_7`.
Columns:
`query: str`, `positive: str`, `negative_1: str`, `negative_2: str`, `negative_3: str`, `negative_4: str`, `negative_5: str`, `negative_6: str`, `negative_7: str`
Columns:
`query: str`, `positive: str`, `negative_1: str`, `negative_2: str`, `negative_3: str`, `negative_4: str`, `negative_5: str`, `negative_6: str`, `negative_7: str`
## Source data
- Dataset: `unicamp-dl/mmarco`
- Revision: `refs/convert/parquet` (parquet-converted version)
- Target subsets: all language subsets available under `refs/convert/parquet`
- Split: partial train Parquet for each language (`{lang}/partial/train/*.parquet` or `{lang}/partial-train/*.parquet`)
- Main columns in source: `query`, `positive`, `negative`
## Construction procedure (reproducible processing)
The following steps reproduce the dataset. We describe the processing itself rather than local scripts or environments.
### 1. Aggregate triplets into hard-negative bundles
1. Load all partial train Parquet files for each language subset.
2. Keep only rows where `query`, `positive`, and `negative` are all present.
3. Group by `(query, positive)` and deduplicate negatives with a set.
4. For each `(query, positive)`, create a record:
- `query`: string
- `pos_text`: `positive`
- `negs_text`: unique list of negatives for that `(query, positive)` (sorted for determinism)
### 2. Cross-encoder re-scoring
Score `(query, text)` pairs using:
- Model: `BAAI/bge-reranker-v2-m3` (Cross-Encoder)
- Max length: 512 tokens
- No quantization or distillation; standard inference in bf16
For each record:
1. Score `(query, pos_text)` → `pos_score`
2. Score `(query, neg)` for each `negs_text` → `negs_score` (same order as `negs_text`)
### 3. Filtering conditions
The reranker-score filtering here is implemented with reference to the approach in
[ruri-v3-dataset-reranker](https://huggingface.co/datasets/cl-nagoya/ruri-v3-dataset-reranker).
Keep a record only if all conditions hold:
- `pos_score > 0.3`
- keep only negatives with `neg_score < 0.7`
- at least 1 negative remains after filtering
Save the remaining negative count as `negs_count`.
## Output columns
- `query` (string)
- `pos_text` (string)
- `negs_text` (list[string])
- `negs_count` (int)
- `pos_score` (float)
- `negs_score` (list[float])
`negs_score` follows the same order as `negs_text`.
## License
Follows the original mMARCO license.
提供机构:
hotchpotch



