five

GreenNode/nano-msmarco-vn

收藏
Hugging Face2025-12-30 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/GreenNode/nano-msmarco-vn
下载链接
链接失效反馈
官方服务:
资源简介:
NanoMSMARCO-VN是一个从MS MARCO数据集翻译而来的越南语数据集,专注于深度学习在搜索领域的应用。该数据集是越南语大规模文本嵌入基准(VN-MTEB)的一部分。创建过程使用了大型语言模型(LLMs)进行翻译,先进的嵌入模型进行过滤,以及LLM-as-a-judge进行质量评分。数据集包含多个配置(corpus、qrels、queries),每个配置都有特定的特征和分割。数据集采用cc-by-sa-4.0许可证,支持多语言(翻译)。任务类别包括文本检索、多项选择题问答和问答。数据集源自GreenNode/msmarco-vn,并用于MTEB框架的评估。

NanoMSMARCO-VN is a translated dataset from MS MARCO focused on deep learning in search. It is part of the Vietnamese Massive Text Embedding Benchmark (VN-MTEB). The dataset creation involves using large language models (LLMs) for translation, advanced embedding models for filtering, and LLM-as-a-judge for quality scoring. The dataset includes multiple configurations (corpus, qrels, queries) with specific features and splits. It is licensed under cc-by-sa-4.0 and is multilingual (translated). The task categories include text-retrieval, multiple-choice-qa, and question-answering. The dataset is derived from GreenNode/msmarco-vn and is intended for evaluation using the MTEB framework.
提供机构:
GreenNode
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作