GreenNode/TVPL-Retrieval-VN
收藏Hugging Face2025-12-14 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/GreenNode/TVPL-Retrieval-VN
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是越南语大规模文本嵌入基准(VN-MTEB)的一部分,主要用于文本检索任务。数据集包含多个配置,包括语料库、查询和默认配置,每个配置有不同的特征和分割。语料库配置包含标题和文本字段,查询配置包含查询ID和文本字段,默认配置包含查询ID、语料库ID和分数字段。数据集是多语言的(翻译自其他语言),源数据集为GreenNode/TVPL-Retrieval。数据集用于评估嵌入模型,并提供了如何在MTEB框架下进行评估的代码示例。
This dataset is part of the Vietnamese Massive Text Embedding Benchmark (VN-MTEB) and is primarily used for text retrieval tasks. The dataset includes multiple configurations: corpus, queries, and default, each with different features and splits. The corpus configuration contains title and text fields, the queries configuration contains query ID and text fields, and the default configuration contains query ID, corpus ID, and score fields. The dataset is multilingual (translated from other languages) and sourced from GreenNode/TVPL-Retrieval. It is used for evaluating embedding models, and the README provides code examples for evaluation within the MTEB framework.
提供机构:
GreenNode



