five

economics_reports_v2

收藏
魔搭社区2026-01-06 更新2025-06-07 收录
下载链接:
https://modelscope.cn/datasets/vidore/economics_reports_v2
下载链接
链接失效反馈
官方服务:
资源简介:
# Vidore Benchmark 2 - World Economics report Dataset (Multilingual) This dataset is part of the "Vidore Benchmark 2" collection, designed for evaluating visual retrieval applications. It focuses on the theme of **World economic reports from 2024**. ## Dataset Summary The dataset contain queries in the following languages : ["english", "french", "german", "spanish"]. Each query was originaly in "english" (see [https://huggingface.co/datasets/vidore/synthetic_economics_macro_economy_2024_filtered_v1.0](https://huggingface.co/datasets/vidore/synthetic_economics_macro_economy_2024_filtered_v1.0])) and was tranlated using gpt-4o. This dataset provides a focused benchmark for visual retrieval tasks related to World economic reports. It includes a curated set of documents, queries, relevance judgments (qrels), and page images. * **Number of Documents:** 5 * **Number of Queries:** 232 * **Number of Pages:** 452 * **Number of Relevance Judgments (qrels):** 3628 * **Average Number of Pages per Query:** 15.6 ## Dataset Structure (Hugging Face Datasets) The dataset is structured into the following columns: * **`docs`**: Contains document metadata, likely including a `"doc-id"` field to uniquely identify each document. * **`corpus`**: Contains page-level information: * `"image"`: The image of the page (a PIL Image object). * `"doc-id"`: The ID of the document this page belongs to. * `"corpus-id"`: A unique identifier for this specific page within the corpus. * **`queries`**: Contains query information: * `"query-id"`: A unique identifier for the query. * `"query"`: The text of the query. * `"language"`: The language of the query * **`qrels`**: Contains relevance judgments: * `"corpus-id"`: The ID of the relevant page. * `"query-id"`: The ID of the query. * `"answer"`: Answer relevant to the query AND the page. * `"score"`: The relevance score. ## Usage This dataset is designed for evaluating the performance of visual retrieval systems, particularly those focused on document image understanding. **Example Evaluation with ColPali (CLI):** Here's a code snippet demonstrating how to evaluate the ColPali model on this dataset using the `vidore-benchmark` command-line tool. 1. **Install the `vidore-benchmark` package:** ```bash pip install vidore-benchmark datasets ``` 2. **Run the evaluation:** ```bash vidore-benchmark evaluate-retriever \ --model-class colpali \ --model-name vidore/colpali-v1.3 \ --dataset-name vidore/economics_reports_v2 \ --dataset-format beir \ --split test ``` For more details on using `vidore-benchmark`, refer to the official documentation: [https://github.com/illuin-tech/vidore-benchmark](https://github.com/illuin-tech/vidore-benchmark) ## Citation If you use this dataset in your research or work, please cite: ```bibtex @misc{faysse2024colpaliefficientdocumentretrieval, title={ColPali: Efficient Document Retrieval with Vision Language Models}, author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Céline Hudelot and Pierre Colombo}, year={2024}, eprint={2407.01449}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2407.01449}, } @misc{macé2025vidorebenchmarkv2raising, title={ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval}, author={Quentin Macé and António Loison and Manuel Faysse}, year={2025}, eprint={2505.17166}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2505.17166}, } ``` ## Acknowledgments This work is partially supported by [ILLUIN Technology](https://www.illuin.tech/), and by a grant from ANRT France.

# Vidore基准测试2——世界经济报告数据集(多语言版) 本数据集隶属于"Vidore基准测试2"合集,专为评估视觉检索应用而设计,核心主题为**2024年世界经济报告**。 ## 数据集概述 本数据集包含以下语言的查询:["english", "french", "german", "spanish"](即英语、法语、德语、西班牙语)。所有查询最初均以英文撰写(详见[https://huggingface.co/datasets/vidore/synthetic_economics_macro_economy_2024_filtered_v1.0](https://huggingface.co/datasets/vidore/synthetic_economics_macro_economy_2024_filtered_v1.0)),并通过GPT-4o完成翻译。 本数据集为与世界经济报告相关的视觉检索任务提供了聚焦型基准测试集,包含精心筛选的文档集合、查询集、相关性标注(qrels)以及页面图像。 * **文档数量:5** * **查询数量:232** * **页面总数:452** * **相关性标注(qrels)总数:3628** * **单查询平均关联页面数:15.6** ## 数据集结构(Hugging Face Datasets格式) 本数据集采用Hugging Face Datasets格式进行组织,包含以下字段列: * **`"docs"`**:存储文档元数据,通常包含用于唯一标识每份文档的`"doc-id"`字段。 * **`"corpus"`**:存储页面级信息: * `"image"`:页面图像(PIL图像对象)。 * `"doc-id"`:该页面所属文档的ID。 * `"corpus-id"`:语料库中该特定页面的唯一标识符。 * **`"queries"`**:存储查询相关信息: * `"query-id"`:查询的唯一标识符。 * `"query"`:查询文本内容。 * `"language"`:查询所使用的语言。 * **`"qrels"`**:存储相关性标注信息: * `"corpus-id"`:相关页面的ID。 * `"query-id"`:对应查询的ID。 * `"answer"`:与查询及页面相关的答案内容。 * `"score"`:相关性评分。 ## 使用说明 本数据集旨在评估视觉检索系统(尤其是聚焦于文档图像理解的系统)的性能。 **使用ColPali的示例评估(命令行方式):** 以下代码片段演示了如何通过`vidore-benchmark`命令行工具在本数据集上评估ColPali模型: 1. **安装`vidore-benchmark`依赖包:** bash pip install vidore-benchmark datasets 2. **运行评估任务:** bash vidore-benchmark evaluate-retriever \ --model-class colpali \ --model-name vidore/colpali-v1.3 \ --dataset-name vidore/economics_reports_v2 \ --dataset-format beir \ --split test 如需了解`vidore-benchmark`的更多使用细节,请参阅官方文档:[https://github.com/illuin-tech/vidore-benchmark](https://github.com/illuin-tech/vidore-benchmark) ## 引用格式 若您在研究或工作中使用本数据集,请引用如下文献: bibtex @misc{faysse2024colpaliefficientdocumentretrieval, title={ColPali: Efficient Document Retrieval with Vision Language Models}, author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Céline Hudelot and Pierre Colombo}, year={2024}, eprint={2407.01449}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2407.01449}, } @misc{macé2025vidorebenchmarkv2raising, title={ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval}, author={Quentin Macé and António Loison and Manuel Faysse}, year={2025}, eprint={2505.17166}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2505.17166}, } ## 致谢 本研究得到了[ILLUIN Technology](https://www.illuin.tech/)以及法国国家技术研究署(ANRT)的部分资助。
提供机构:
maas
创建时间:
2025-06-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作