esg_reports_v2

Name: esg_reports_v2
Creator: maas
Published: 2025-12-05 16:37:17
License: 暂无描述

魔搭社区2025-12-05 更新2025-06-07 收录

下载链接：

https://modelscope.cn/datasets/vidore/esg_reports_v2

下载链接

链接失效反馈

官方服务：

资源简介：

# Vidore Benchmark 2 - ESG Restaurant Dataset (Multilingual) This dataset is part of the "Vidore Benchmark 2" collection, designed for evaluating visual retrieval applications. It focuses on the theme of **ESG reports in the fast food industry**. ## Dataset Summary The dataset contain queries in the following languages : ["english", "french", "german", "spanish"]. Each query was originaly in "french" (see [https://huggingface.co/datasets/vidore/synthetic_rse_restaurant_filtered_v1.0](https://huggingface.co/datasets/vidore/synthetic_rse_restaurant_filtered_v1.0)) and was tranlated using gpt-4o. This dataset provides a focused benchmark for visual retrieval tasks related to ESG reports of fast food companies. It includes a curated set of documents, queries, relevance judgments (qrels), and page images. * **Number of Documents:** 30 * **Number of Queries:** 228 * **Number of Pages:** 1538 * **Number of Relevance Judgments (qrels):** 888 * **Average Number of Pages per Query:** 3.9 ## Dataset Structure (Hugging Face Datasets) The dataset is structured into the following columns: * **`docs`**: Contains document metadata, likely including a `"doc-id"` field to uniquely identify each document. * **`corpus`**: Contains page-level information: * `"image"`: The image of the page (a PIL Image object). * `"doc-id"`: The ID of the document this page belongs to. * `"corpus-id"`: A unique identifier for this specific page within the corpus. * **`queries`**: Contains query information: * `"query-id"`: A unique identifier for the query. * `"query"`: The text of the query. * `"language"`: The language of the query * **`qrels`**: Contains relevance judgments: * `"corpus-id"`: The ID of the relevant page. * `"query-id"`: The ID of the query. * `"answer"`: Answer relevant to the query AND the page. * `"score"`: The relevance score. ## Usage This dataset is designed for evaluating the performance of visual retrieval systems, particularly those focused on document image understanding. **Example Evaluation with ColPali (CLI):** Here's a code snippet demonstrating how to evaluate the ColPali model on this dataset using the `vidore-benchmark` command-line tool. 1. **Install the `vidore-benchmark` package:** ```bash pip install vidore-benchmark datasets ``` 2. **Run the evaluation:** ```bash vidore-benchmark evaluate-retriever \ --model-class colpali \ --model-name vidore/colpali-v1.3 \ --dataset-name vidore/synthetic_rse_restaurant_filtered_v1.0_multilingual \ --dataset-format beir \ --split test ``` For more details on using `vidore-benchmark`, refer to the official documentation: [https://github.com/illuin-tech/vidore-benchmark](https://github.com/illuin-tech/vidore-benchmark) ## Citation If you use this dataset in your research or work, please cite: ```bibtex @misc{faysse2024colpaliefficientdocumentretrieval, title={ColPali: Efficient Document Retrieval with Vision Language Models}, author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Céline Hudelot and Pierre Colombo}, year={2024}, eprint={2407.01449}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2407.01449}, } @misc{macé2025vidorebenchmarkv2raising, title={ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval}, author={Quentin Macé and António Loison and Manuel Faysse}, year={2025}, eprint={2505.17166}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2505.17166}, } ``` ## Acknowledgments This work is partially supported by [ILLUIN Technology](https://www.illuin.tech/), and by a grant from ANRT France. ## Copyright All rights are reserved to the original authors of the documents.

# Vidore基准测试2 - ESG（环境、社会及治理）餐厅数据集（多语言版）本数据集隶属于「Vidore基准测试2」合集，专为评估视觉检索应用而设计，核心主题为快餐行业的ESG报告。 ## 数据集摘要本数据集的查询文本涵盖以下语言：英语、法语、德语、西班牙语。所有查询原始均为法语（来源见[https://huggingface.co/datasets/vidore/synthetic_rse_restaurant_filtered_v1.0](https://huggingface.co/datasets/vidore/synthetic_rse_restaurant_filtered_v1.0)），并通过GPT-4o完成翻译。本数据集为快餐企业ESG报告相关的视觉检索任务提供了精准基准测试集，包含经过精选的文档、查询文本、相关性标注（qrels）以及页面图像。 * **文档总数：30** * **查询文本总数：228** * **页面总数：1538** * **相关性标注（qrels）总数：888** * **单查询平均关联页面数：3.9** ## 数据集结构（基于Hugging Face Datasets）本数据集采用以下列结构： * **`docs`**：存储文档元数据，通常包含用于唯一标识每份文档的`"doc-id"`字段。 * **`corpus`**：存储页面级信息： * `"image"`：页面图像（PIL Image对象）。 * `"doc-id"`：该页面所属文档的ID。 * `"corpus-id"`：该页面在语料库中的唯一标识符。 * **`queries`**：存储查询文本相关信息： * `"query-id"`：查询的唯一标识符。 * `"query"`：查询文本内容。 * `"language"`：查询文本所用语言。 * **`qrels`**：存储相关性标注信息： * `"corpus-id"`：关联页面的ID。 * `"query-id"`：对应查询的ID。 * `"answer"`：与该查询及页面均相关的答案内容。 * `"score"`：相关性评分。 ## 数据集用途本数据集专为评估视觉检索系统的性能而设计，尤其适用于聚焦文档图像理解的检索系统。 ### 基于命令行界面（CLI）的ColPali模型评估示例以下代码示例展示了如何通过`vidore-benchmark`命令行工具，在本数据集上评估ColPali模型的性能： 1. **安装`vidore-benchmark`工具包：** bash pip install vidore-benchmark datasets 2. **执行评估：** bash vidore-benchmark evaluate-retriever --model-class colpali --model-name vidore/colpali-v1.3 --dataset-name vidore/synthetic_rse_restaurant_filtered_v1.0_multilingual --dataset-format beir --split test 如需了解`vidore-benchmark`的更多使用细节，请参考官方文档：[https://github.com/illuin-tech/vidore-benchmark](https://github.com/illuin-tech/vidore-benchmark) ## 引用格式若您在研究或工作中使用本数据集，请引用以下文献： bibtex @misc{faysse2024colpaliefficientdocumentretrieval, title={ColPali: Efficient Document Retrieval with Vision Language Models}, author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Céline Hudelot and Pierre Colombo}, year={2024}, eprint={2407.01449}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2407.01449}, } @misc{macé2025vidorebenchmarkv2raising, title={ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval}, author={Quentin Macé and António Loison and Manuel Faysse}, year={2025}, eprint={2505.17166}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2505.17166}, } ## 致谢本研究部分获得[ILLUIN Technology](https://www.illuin.tech/)以及法国国家技术研究署（ANRT）的资助。 ## 版权声明本数据集所有文档的版权归原作者所有。

提供机构：

maas

创建时间：

2025-06-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集