GreenNode/nano-dbpedia-vn
收藏Hugging Face2025-12-30 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/GreenNode/nano-dbpedia-vn
下载链接
链接失效反馈官方服务:
资源简介:
NanoDBPedia-VN是一个越南语的翻译数据集,源自DBpedia-Entity标准测试集合,用于在DBpedia知识库上进行实体搜索。数据集的创建过程涉及使用大型语言模型(LLMs)进行翻译,并应用先进的嵌入模型来过滤翻译结果,以及使用LLM-as-a-judge来评分样本质量。该数据集属于MTEB(Massive Text Embedding Benchmark)的一部分,主要用于文本检索任务。数据集包含三个配置:corpus(语料库)、qrels(查询相关性)和queries(查询),分别对应不同的数据内容和用途。数据集的语言为越南语,许可证为cc-by-sa-4.0,多语言性为翻译。
NanoDBPedia-VN is a translated dataset from DBpedia-Entity, a standard test collection for entity search over the DBpedia knowledge base. The dataset creation process involves using large language models (LLMs) for translation, applying advanced embedding models to filter translations, and using LLM-as-a-judge to score sample quality based on multiple criteria. It is part of the MTEB (Massive Text Embedding Benchmark) and is primarily used for text retrieval tasks. The dataset includes three configurations: corpus, qrels, and queries, corresponding to the corpus, query relevance, and queries, respectively. The dataset is in Vietnamese, licensed under cc-by-sa-4.0, and is multilingual with a focus on translated content.
提供机构:
GreenNode



