mteb/MIRACLJaRetrievalLite
收藏Hugging Face2025-12-13 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/mteb/MIRACLJaRetrievalLite
下载链接
链接失效反馈官方服务:
资源简介:
MIRACLJaRetrievalLite是MTEB(大规模文本嵌入基准)数据集的一部分,是一个多语言检索数据集的日语轻量级版本。该数据集包含105,064个文档,是通过从5个高性能模型中提取的困难负例构建的。数据集由专家标注,语言为日语,许可证为Apache-2.0。数据集包含三个主要部分:语料库(corpus)、查询相关度(qrels)和查询(queries)。任务类别为文本检索,涉及的领域包括百科全书和书面文本。数据集来源于sbintuitions/JMTEB-lite,并经过额外处理。
MIRACLJaRetrievalLite is part of the MTEB (Massive Text Embedding Benchmark) dataset, serving as a lightweight Japanese version of a multilingual retrieval dataset. It contains 105,064 documents constructed using hard negatives from 5 high-performance models. The dataset is expert-annotated, in Japanese (jpn), and licensed under Apache-2.0. It includes three main components: corpus, qrels (query relevance), and queries. The task category is text-retrieval, covering domains such as encyclopaedic and written texts. The dataset is sourced from sbintuitions/JMTEB-lite and has undergone additional processing.
提供机构:
mteb



