MMARCO Japanese Hard Negatives
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/bclavie/mmarco-japanese-hard-negatives
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是对日本MMARCO数据集的硬负样本增强版本,旨在训练更强大的日语检索模型。该数据集包含了使用多语言e5嵌入和BM25方法生成的硬负样本,以增强模型对相关和不相关段落之间区分的能力。尽管训练集有限,但已生成了大量的硬负样本。该数据集的任务是文档检索。
This dataset is a hard negative sample-enhanced version of the Japanese MMARCO dataset, designed for training more powerful Japanese retrieval models. It contains hard negative samples generated using multilingual E5 embeddings and the BM25 method, aimed at enhancing the model's ability to distinguish between relevant and irrelevant passages. Despite the limited scale of the training set, a substantial number of hard negative samples have been generated. The core task of this dataset is document retrieval.
提供机构:
bclavie



