下载链接：

https://modelscope.cn/datasets/MBZUAI/EXAMS-V

下载链接

链接失效反馈

官方服务：

资源简介：

# EXAMS-V: ImageCLEF 2025 – Multimodal Reasoning Dimitar Iliyanov Dimitrov, Hee Ming Shan, Zhuohan Xie, [Rocktim Jyoti Das](https://rocktimjyotidas.github.io/) , Momina Ahsan, Sarfraz Ahmad, Nikolay Paev, Ali Mekky, Omar El Herraoui, Rania Hossam, Nurdaulet Mukhituly, Akhmed Sakip, [Ivan Koychev](https://scholar.google.com/citations?user=o5YAI9wAAAAJ&hl=en), [Preslav Nakov](https://mbzuai.ac.ae/study/faculty/preslav-nakov/) ## INTRODUCTION EXAMS-V is a multilingual, multimodal dataset created to evaluate and benchmark the visual reasoning abilities of AI systems, especially Vision-Language Models (VLMs). The dataset contains 24,856 multiple-choice questions (MCQs) collected from real school exams and other educational sources. All questions are presented as images. These images include not just text, but also tables, graphs, and mathematical content, which makes EXAMS-V a strong benchmark for testing how well models can handle visual and structured information. The questions are written in 13 different languages: English, Arabic, Chinese, German, Bulgarian, Italian, Spanish, Urdu, Polish, Hungarian, Serbian, and Croatian and they encompass multiple domain of subject categories. The dataset is curated from real school exams from different countries and education systems. This gives it a unique mix of region-specific knowledge, varied question formats, and multilingual content. Answering the questions in EXAMS-V is not just about reading. Models also need to understand the visual layout, interpret diagrams and symbols, and reason over both text and visuals. ## Dataset Statistics The following table shows the distribution of samples in the dataset across all languages. Each row reports the total number of questions per language, including how many are text-only and how many contain visual elements such as tables, figures, graphs, or scientific symbols. | Language | Grade | Subjects | Total Samples | Visual Qs. | Text Only | Table | Figure | Graph | |:-----------|:--------|-----------:|----------------:|-------------:|------------:|--------:|---------:|--------:| | Arabic | 4-12 | 7 | 1045 | 288 | 757 | 17 | 206 | 51 | | Bulgarian | 4, 12 | 7 | 2332 | 503 | 1829 | 45 | 373 | 80 | | Chinese | 4, 12 | 8 | 3042 | 2186 | 856 | 281 | 1503 | 435 | | Croatian | 12 | 15 | 4172 | 758 | 3414 | 66 | 555 | 121 | | English | 10-12 | 4 | 1236 | 215 | 1021 | 2 | 120 | 54 | | French | 12 | 3 | 439 | 50 | 389 | 0 | 43 | 7 | | German | 12 | 7 | 1077 | 211 | 866 | 5 | 163 | 45 | | Hungarian | 8, 12 | 14 | 4048 | 525 | 3523 | 7 | 421 | 97 | | Italian | 12 | 13 | 1848 | 351 | 1497 | 33 | 234 | 69 | | Kazakh | 11 | 4 | 243 | 243 | 0 | 4 | 47 | 192 | | Polish | 12 | 8 | 2770 | 526 | 2244 | 142 | 384 | 13 | | Romanian | 12 | 1 | 5 | 0 | 5 | 0 | 0 | 0 | | Russian | 12 | 1 | 9 | 0 | 9 | 0 | 0 | 0 | | Serbian | 12 | 13 | 1637 | 319 | 1318 | 26 | 224 | 48 | | Slovakian | 12 | 1 | 46 | 0 | 46 | 0 | 0 | 0 | | Spanish | 12 | 10 | 638 | 285 | 353 | 66 | 149 | 54 | | Urdu | 9-10 | 5 | 269 | 0 | 269 | 0 | 0 | 0 | | Total | - | 121 | 24856 | 6460 | 18396 | 694 | 4422 | 1266 | The following is a histogram showing the distribution of languages in the EXAMS-V dataset. The chart reflects how many samples exist for each language across the full dataset (train, validation, and test). <div style="text-align: center;"> <img src="./Screenshot 2025-05-24 at 3.28.05 am.png" alt="Language Distribution Histogram" width="600"/> </div> The following sunburst chart shows the distribution of subjects across different languages in the EXAMS-V dataset. The inner ring represents languages, while the outer ring shows the subjects present within each language. This visualization highlights the multilingual and multi-domain nature of the dataset. <div style="text-align: center;"> <img src="./newplot.png" alt="Subject-Language Sunburst" width="600"/> </div>

# EXAMS-V: ImageCLEF 2025 – 多模态推理 Dimitar Iliyanov Dimitrov, Hee Ming Shan, Zhuohan Xie, [Rocktim Jyoti Das](https://rocktimjyotidas.github.io/) , Momina Ahsan, Sarfraz Ahmad, Nikolay Paev, Ali Mekky, Omar El Herraoui, Rania Hossam, Nurdaulet Mukhituly, Akhmed Sakip, [Ivan Koychev](https://scholar.google.com/citations?user=o5YAI9wAAAAJ&hl=en), [Preslav Nakov](https://mbzuai.ac.ae/study/faculty/preslav-nakov/) ## 引言 EXAMS-V是一款多语言多模态数据集，旨在评估与基准测试AI系统，尤其是视觉语言模型（Vision-Language Models, VLMs）的视觉推理能力。本数据集包含24856道多项选择题（Multiple-Choice Questions, MCQs），采集自真实学校考试及其他教育资源。所有题目均以图像形式呈现，这些图像不仅包含文本，还涵盖表格、图表与数学内容，这使得EXAMS-V成为测试模型处理视觉与结构化信息能力的优质基准。本数据集的题目使用13种不同语言编写：英语、阿拉伯语、汉语、德语、保加利亚语、意大利语、西班牙语、乌尔都语、波兰语、匈牙利语、塞尔维亚语与克罗地亚语，且覆盖多个学科领域。数据集源自不同国家与教育体系的真实学校考试，因此兼具区域特色知识、多样的题目格式与多语言内容的独特组合。解答EXAMS-V中的题目不仅需要阅读理解能力，模型还需理解视觉布局、解读图表与符号，并同时对文本与视觉内容进行推理。 ## 数据集统计下表展示了数据集中各语言的样本分布情况。每一行列出了对应语言的题目总数，以及其中仅包含文本的题目数量、包含表格、图像、图表或科学符号等视觉元素的题目数量，还细分了各类视觉元素的具体题数。 | 语言 | 年级 | 学科数 | 总样本数 | 含视觉元素题目数 | 纯文本题目数 | 表格类题目数 | 图像类题目数 | 图表类题目数 | |:-------|:-------|-------:|---------:|---------------:|------------:|-------------:|-------------:|-------------:| | 阿拉伯语 | 4-12 | 7 | 1045 | 288 | 757 | 17 | 206 | 51 | | 保加利亚语 | 4, 12 | 7 | 2332 | 503 | 1829 | 45 | 373 | 80 | | 汉语 | 4, 12 | 8 | 3042 | 2186 | 856 | 281 | 1503 | 435 | | 克罗地亚语 | 12 | 15 | 4172 | 758 | 3414 | 66 | 555 | 121 | | 英语 | 10-12 | 4 | 1236 | 215 | 1021 | 2 | 120 | 54 | | 法语 | 12 | 3 | 439 | 50 | 389 | 0 | 43 | 7 | | 德语 | 12 | 7 | 1077 | 211 | 866 | 5 | 163 | 45 | | 匈牙利语 | 8, 12 | 14 | 4048 | 525 | 3523 | 7 | 421 | 97 | | 意大利语 | 12 | 13 | 1848 | 351 | 1497 | 33 | 234 | 69 | | 哈萨克语 | 11 | 4 | 243 | 243 | 0 | 4 | 47 | 192 | | 波兰语 | 12 | 8 | 2770 | 526 | 2244 | 142 | 384 | 13 | | 罗马尼亚语 | 12 | 1 | 5 | 0 | 5 | 0 | 0 | 0 | | 俄语 | 12 | 1 | 9 | 0 | 9 | 0 | 0 | 0 | | 塞尔维亚语 | 12 | 13 | 1637 | 319 | 1318 | 26 | 224 | 48 | | 斯洛伐克语 | 12 | 1 | 46 | 0 | 46 | 0 | 0 | 0 | | 西班牙语 | 12 | 10 | 638 | 285 | 353 | 66 | 149 | 54 | | 乌尔都语 | 9-10 | 5 | 269 | 0 | 269 | 0 | 0 | 0 | | 总计 | - | 121 | 24856 | 6460 | 18396 | 694 | 4422 | 1266 | 下图为展示EXAMS-V数据集语言分布的直方图，该图表反映了全数据集（训练集、验证集与测试集）中各语言的样本量。 <div style="text-align: center;"><img src="./Screenshot 2025-05-24 at 3.28.05 am.png" alt="语言分布直方图" width="600"/></div> 以下旭日图展示了EXAMS-V数据集不同语言对应的学科分布。内层环代表语言，外层环则展示各语言所包含的学科。该可视化图表凸显了数据集的多语言与多领域特性。 <div style="text-align: center;"><img src="./newplot.png" alt="学科-语言旭日图" width="600"/></div>

应用场景：