lsz05/MIRACLJaRetrievalLite
收藏Hugging Face2025-12-12 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/lsz05/MIRACLJaRetrievalLite
下载链接
链接失效反馈官方服务:
资源简介:
MIRACLJaRetrievalLite是一个多语言检索数据集,是MTEB(Massive Text Embedding Benchmark)的一部分。这是日语的轻量级版本,包含105,064个文档,使用5个高性能模型的硬负样本构建。数据集任务类别是t2t,领域涵盖百科全书和书面文本。数据集包含三个配置:corpus(语料库)、qrels(查询相关文档)和queries(查询)。
MIRACLJaRetrievalLite is a multilingual retrieval dataset, part of the MTEB (Massive Text Embedding Benchmark). This is the lightweight Japanese version with a reduced corpus of 105,064 documents constructed using hard negatives from 5 high-performance models. The task category is t2t, and the domains include Encyclopaedic and Written texts. The dataset includes three configurations: corpus, qrels, and queries.
提供机构:
lsz05



