economics_reports_eng_v2
收藏魔搭社区2025-12-05 更新2025-06-07 收录
下载链接:
https://modelscope.cn/datasets/vidore/economics_reports_eng_v2
下载链接
链接失效反馈官方服务:
资源简介:
# Vidore Benchmark 2 - World Economics report Dataset
This dataset is part of the "Vidore Benchmark 2" collection, designed for evaluating visual retrieval applications. It focuses on the theme of **World economic reports from 2024**.
## Dataset Summary
Each query is in english.
This dataset provides a focused benchmark for visual retrieval tasks related to World economic reports. It includes a curated set of documents, queries, relevance judgments (qrels), and page images.
* **Number of Documents:** 5
* **Number of Queries:** 58
* **Number of Pages:** 452
* **Number of Relevance Judgments (qrels):** 907
* **Average Number of Pages per Query:** 15.6
## Dataset Structure (Hugging Face Datasets)
The dataset is structured into the following columns:
* **`docs`**: Contains document metadata, likely including a `"doc-id"` field to uniquely identify each document.
* **`corpus`**: Contains page-level information:
* `"image"`: The image of the page (a PIL Image object).
* `"doc-id"`: The ID of the document this page belongs to.
* `"corpus-id"`: A unique identifier for this specific page within the corpus.
* **`queries`**: Contains query information:
* `"query-id"`: A unique identifier for the query.
* `"query"`: The text of the query.
* **`qrels`**: Contains relevance judgments:
* `"corpus-id"`: The ID of the relevant page.
* `"query-id"`: The ID of the query.
* `"answer"`: Answer relevant to the query AND the page.
* `"score"`: The relevance score.
## Usage
This dataset is designed for evaluating the performance of visual retrieval systems, particularly those focused on document image understanding.
**Example Evaluation with ColPali (CLI):**
Here's a code snippet demonstrating how to evaluate the ColPali model on this dataset using the `vidore-benchmark` command-line tool.
1. **Install the `vidore-benchmark` package:**
```bash
pip install vidore-benchmark datasets
```
2. **Run the evaluation:**
```bash
vidore-benchmark evaluate-retriever \
--model-class colpali \
--model-name vidore/colpali-v1.3 \
--dataset-name vidore/synthetic_economics_macro_economy_2024_filtered_v1.0 \
--dataset-format beir \
--split test
```
For more details on using `vidore-benchmark`, refer to the official documentation: [https://github.com/illuin-tech/vidore-benchmark](https://github.com/illuin-tech/vidore-benchmark)
## Citation
If you use this dataset in your research or work, please cite:
```bibtex
@misc{faysse2024colpaliefficientdocumentretrieval,
title={ColPali: Efficient Document Retrieval with Vision Language Models},
author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Céline Hudelot and Pierre Colombo},
year={2024},
eprint={2407.01449},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2407.01449},
}
@misc{macé2025vidorebenchmarkv2raising,
title={ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval},
author={Quentin Macé and António Loison and Manuel Faysse},
year={2025},
eprint={2505.17166},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2505.17166},
}
```
## Acknowledgments
This work is partially supported by [ILLUIN Technology](https://www.illuin.tech/), and by a grant from ANRT France.
# Vidore基准测试2——世界经济报告数据集
本数据集隶属于「Vidore基准测试2」合集,专为视觉检索应用的评估而设计,主题聚焦于**2024年世界经济报告**。
## 数据集概述
所有查询均为英文文本。
本数据集为与世界经济报告相关的视觉检索任务提供了专属基准测试集,包含经过精选的文档、查询、相关性标注(qrels)以及页面图像。
* **文档总数:5份**
* **查询总数:58个**
* **页面总数:452张**
* **相关性标注(qrels)总数:907条**
* **单查询平均关联页面数:15.6**
## 数据集结构(Hugging Face Datasets格式)
该数据集采用以下字段组织:
* **`"docs"`**:存储文档元数据,通常包含用于唯一标识每份文档的`"doc-id"`字段。
* **`"corpus"`**:存储页面级信息:
* `"image"`:页面图像(PIL图像对象格式)。
* `"doc-id"`:该页面所属文档的ID。
* `"corpus-id"`:该页面在语料库中的唯一标识符。
* **`"queries"`**:存储查询信息:
* `"query-id"`:查询的唯一标识符。
* `"query"`:查询文本内容。
* **`"qrels"`**:存储相关性标注信息:
* `"corpus-id"`:关联页面的ID。
* `"query-id"`:对应查询的ID。
* `"answer"`:与该查询及页面均相关的答案内容。
* `"score"`:相关性评分。
## 使用说明
本数据集旨在评估视觉检索系统的性能,尤其适用于聚焦文档图像理解的检索系统。
**基于ColPali的命令行评估示例:**
以下代码示例展示了如何通过`vidore-benchmark`命令行工具,在本数据集上评估ColPali模型。
1. **安装`vidore-benchmark`包:**
bash
pip install vidore-benchmark datasets
2. **执行评估:**
bash
vidore-benchmark evaluate-retriever
--model-class colpali
--model-name vidore/colpali-v1.3
--dataset-name vidore/synthetic_economics_macro_economy_2024_filtered_v1.0
--dataset-format beir
--split test
若需了解`vidore-benchmark`的更多使用细节,请参考官方文档:[https://github.com/illuin-tech/vidore-benchmark](https://github.com/illuin-tech/vidore-benchmark)
## 引用方式
若您在研究或工作中使用本数据集,请引用以下文献:
bibtex
@misc{faysse2024colpaliefficientdocumentretrieval,
title={ColPali: Efficient Document Retrieval with Vision Language Models},
author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Céline Hudelot and Pierre Colombo},
year={2024},
eprint={2407.01449},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2407.01449},
}
@misc{macé2025vidorebenchmarkv2raising,
title={ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval},
author={Quentin Macé and António Loison and Manuel Faysse},
year={2025},
eprint={2505.17166},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2505.17166},
}
## 致谢
本研究获得了[ILLUIN Technology](https://www.illuin.tech/)以及法国国家技术研究署(ANRT)的部分资助。
提供机构:
maas
创建时间:
2025-06-04



