sionic-ai/NanoBEIR-ja
收藏Hugging Face2025-12-19 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/sionic-ai/NanoBEIR-ja
下载链接
链接失效反馈官方服务:
资源简介:
NanoBEIR-ja是一个日语信息检索评估基准数据集,包含预处理过的查询,属于NanoBEIR基准的一部分。数据集分为三个配置:corpus、qrels和queries,每个配置包含多个子集,如NanoClimateFEVER、NanoDBPedia等。该数据集专为文本检索任务设计,标记为sentence-transformers和retrieval。预处理流程包括格式检测、转换以及使用Gemini 2.5 Flash和GPT-4o模型进行质量验证。提供了预处理查询的示例,展示了从原始陈述到问题格式的转换。
NanoBEIR-ja is a Japanese benchmark dataset for information retrieval evaluation with preprocessed queries, part of the NanoBEIR benchmark. The dataset is structured into three configurations: corpus, qrels, and queries, each with multiple splits corresponding to subsets like NanoClimateFEVER, NanoDBPedia, etc. It is designed for text retrieval tasks and tagged with sentence-transformers and retrieval. The preprocessing pipeline involves format detection, conversion, and quality validation using models like Gemini 2.5 Flash and GPT-4o. Examples of preprocessed queries are provided, demonstrating the transformation from original statements to question formats.
提供机构:
sionic-ai



