mteb/Vidore2EconomicsReportsRetrieval

Name: mteb/Vidore2EconomicsReportsRetrieval
Creator: mteb
Published: 2025-10-21 10:43:14
License: 暂无描述

Hugging Face2025-10-21 更新2025-10-25 收录

下载链接：

https://hf-mirror.com/datasets/mteb/Vidore2EconomicsReportsRetrieval

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个多语言数据集，旨在用于视觉文档检索、图像到文本和文本到图像的任务。数据集包括来自源数据集vidore/economics_reports_v2的派生注释，并支持MTEB（大规模文本嵌入基准）基准。该数据集在四种语言中可用：德语（deu）、英语（eng）、法语（fra）和西班牙语（spa）。它遵循MIT许可证，并包含三种主要类型的数据：语料库、查询和qrels（查询相关性分数）。每种类型的数据都分为“测试”集。数据集针对每种语言有不同的配置，每个配置都有自己的特征集，例如“图像”、“文档-id”、“id”、“文本”、“查询-id”、“语料库-id”和“分数”。该数据集旨在用于评估嵌入模型，并是MTEB项目的一部分，该项目专注于跨多种语言和任务对文本嵌入模型进行基准测试。

This is a multilingual dataset designed for visual-document retrieval, image-to-text, and text-to-image tasks. It includes annotations derived from the source dataset vidore/economics_reports_v2 and is compatible with the MTEB (Massive Text Embedding Benchmark) benchmark. The dataset is available in four languages: German (deu), English (eng), French (fra), and Spanish (spa). It is licensed under the MIT license and consists of three main types of data: corpus, queries, and qrels (query relevance scores). Each type of data is split into a test set. The dataset is structured with different configurations for each language, each with its own set of features, such as image, doc-id, id, text, query-id, corpus-id, and score. It is intended for evaluating embedding models and is part of the MTEB project, which focuses on benchmarking text embedding models across multiple languages and tasks.

提供机构：

mteb

5,000+

优质数据集

54 个

任务类型

进入经典数据集