GreenNode/nano-hotpotqa-vn
收藏Hugging Face2025-12-30 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/GreenNode/nano-hotpotqa-vn
下载链接
链接失效反馈官方服务:
资源简介:
NanoHotpotQA-VN是一个从HotpotQA翻译而来的问答数据集,属于MTEB(大规模文本嵌入基准)的一部分。该数据集包含自然、多跳问题,并提供支持事实的强监督,以支持更可解释的问答系统。数据集的创建过程使用了大型语言模型(如Coherence的Aya模型)进行翻译,并应用了先进的嵌入模型进行过滤和质量评分。数据集的语言为越南语(vie),许可证为cc-by-sa-4.0。数据集包含三个配置:corpus(语料库)、qrels(查询相关度)和queries(查询),每个配置都有详细的特征和分割信息。数据集的任务类别包括文本检索、多项选择问答和问答任务。数据集来源于GreenNode/hotpotqa-vn。
NanoHotpotQA-VN is a translated dataset from HotpotQA, which is a question answering dataset featuring natural, multi-hop questions, with strong supervision for supporting facts to enable more explainable question answering systems. It is part of the MTEB (Massive Text Embedding Benchmark). The dataset creation process involves using large language models (LLMs), specifically Coherences Aya model, for translation, and applies advanced embedding models to filter the translations and score the quality of the samples based on multiple criteria. The dataset is in Vietnamese (vie) and licensed under cc-by-sa-4.0. It includes three configurations: corpus, qrels, and queries, each with detailed features and split information. The task categories include text retrieval, multiple-choice QA, and question answering. The dataset is sourced from GreenNode/hotpotqa-vn.
提供机构:
GreenNode



