PEARL
收藏PEARL Benchmark 数据集概述
数据集基本信息
- 数据集ID:
UBC-NLP/PEARL - 许可证:
cc-by-nc-nd-4.0 - 任务类别: 视觉问答 (Visual Question Answering)
- 语言: 阿拉伯语 (ar)
- 标签: Culture, Arabic, VQA
- 规模类别: 1K<n<10K
数据集描述
PEARL Benchmark 是一个精心策划的子集,包含从更大的 PEARL 数据集中提取的 6,867 个高质量问答对。该数据集专为评估视觉语言模型 (VLMs) 对阿拉伯文化内容的理解而设计,涵盖十个重要的文化领域(如建筑、服装、美食)和十三种不同的问题类型。
关键特征
- 大小: 6,867 个问答对(5,310 个封闭式,1,557 个开放式)
- 文化领域: 10 个(如建筑、食品、服装、节日)
- 问题类型: 13 种(如因果推理、比较分析、假设形成)
- 语言: 阿拉伯语(问题和答案)
- 模态: 图像-文本对
数据集结构
特征
category: 类别 (string)country: 国家 (string)image: 图像 (image)image_id: 图像ID (string)augmented_caption: 增强标题 (string)question: 问题 (string)answer: 答案 (string)answer_letter: 答案字母 (string)choices: 选择项 (sequence of string)question_type: 问题类型 (string)annotation_id: 注释ID (string)qa_index: 问答索引 (int32)
数据拆分
- test:
- 样本数: 6,867
- 大小: 3,607,317,256.405 字节
- 下载大小: 1,432,676,863 字节
相关资源
- 论文: Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset
- ArXiv 链接: http://arxiv.org/abs/2505.21979
- GitHub 仓库: https://github.com/UBC-NLP/pearl
引用
bibtex @article{Alwajih2025pearl, title={Pearl: A Multimodal Culturally-Aware {A}rabic Instruction Dataset}, author={Fakhraddin Alwajih and Samar M. Magdy and Abdellah El Mekki and Omer Nacar and Youssef Nafea and Safaa Taher Abdelfadil and Abdulfattah Mohammed Yahya and Hamzah Luqman and Nada Almarwani and Samah Aloufi and Baraah Qawasmeh and Houdaifa Atou and Serry Sibaee and Hamzah A. Alsayadi and Walid Al-Dhabyani and Maged S. Al-shaibani and Aya El aatar and Nour Qandos and Rahaf Alhamouri and Samar Ahmad and Razan Khassib and Lina Hamad and Mohammed Anwar AL-Ghrawi and Fatimah Alshamari and Cheikh Malainine and Doaa Qawasmeh and Aminetou Yacoub and Tfeil moilid and Ruwa AbuHweidi and Ahmed Aboeitta and Vatimetou Mohamed Lemin and Reem Abdel-Salam and Ahlam Bashiti and Adel Ammar and Aisha Alansari and Ahmed Ashraf and Nora Alturayeif and Sara Shatnawi and Alcides Alcoba Inciarte and AbdelRahim A. Elmadany and Mohamedou cheikh tourad and Ismail Berrada and Mustafa Jarrar and Shady Shehata and Muhammad Abdul-Mageed}, journal={arXiv preprint arXiv:2505.21979}, year={2025} }




