five

mteb/JQaRARerankingLite

收藏
Hugging Face2025-12-13 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/mteb/JQaRARerankingLite
下载链接
链接失效反馈
官方服务:
资源简介:
JQaRARerankingLite是一个用于日语问答与检索增强(Japanese Question Answering with Retrieval Augmentation)的数据集,属于MTEB(Massive Text Embedding Benchmark)的一部分。该数据集包含来自JAQKET的问题和日本维基百科的语料库,是一个轻量级版本,语料库经过缩减(172,897篇文档),并使用了5个高性能模型的硬负样本构建。数据集的任务类别包括文本排序、多项选择问答和问答,语言为日语,许可证为cc-by-sa-4.0。数据集包含四个配置:corpus、qrels、queries和top_ranked,每个配置都有详细的特征和分割信息。数据集来源于sbintuitions/JMTEB-lite。

JQaRARerankingLite is a reranking dataset for Japanese Question Answering with Retrieval Augmentation (JQaRA), part of the Massive Text Embedding Benchmark (MTEB). It consists of questions from JAQKET and corpus from Japanese Wikipedia. This is the lightweight version with a reduced corpus (172,897 documents) constructed using hard negatives from 5 high-performance models. The datasets task categories include text-ranking, multiple-choice-qa, and question-answering. The language is Japanese, and the license is cc-by-sa-4.0. The dataset includes four configurations: corpus, qrels, queries, and top_ranked, each with detailed features and split information. The dataset is derived from sbintuitions/JMTEB-lite.
提供机构:
mteb
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作