PEARL-X
收藏PEARL-X 数据集概述
数据集基本信息
- 数据集ID:
UBC-NLP/PEARL-X - 许可证:
cc-by-nc-nd-4.0 - 语言: 阿拉伯语 (
ar) - 标签:
Culture,Arabic - 任务类别:
- 视觉问答 (
visual-question-answering) - 图像到文本 (
image-to-text)
- 视觉问答 (
数据集描述
PEARL-X (PEARL eXtension) 是一个专门设计的基准测试,用于评估对阿拉伯文化中细微差异的理解能力。该数据集聚焦于阿拉伯文化中共有但在不同背景下表现不同的文化概念,包含单图像和多图像问答任务,挑战模型对文化共有但视觉上不同的项目(如不同类型的咖啡、传统服饰)进行比较、对比和综合信息的能力。
关键特性
- 焦点: 61个共有文化概念的细微文化差异
- 任务: 单图像和多图像问答
- 规模: 367个问题,347张图像
- 目的: 评估对文化细微差异的复杂推理和比较理解能力
- 语言: 阿拉伯语(问题和答案)
- 模态: 图像-文本对(每个问题包含单张或多张图像)
数据集结构
-
特征:
idx: 整型 (int32)concept_name: 字符串 (string)question: 字符串 (string)answer: 字符串 (string)answer_letter: 字符串 (string)choices: 字符串序列 (sequence: string)question_type: 字符串 (string)sub_type: 字符串 (string)image1到image6: 图像 (image)
-
数据分割:
test:- 字节数: 227,731,749
- 样本数: 367
下载信息
- 下载大小: 115,652,865 字节
- 数据集大小: 227,731,749 字节
相关资源
- 论文: Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset
- ArXiv链接: http://arxiv.org/abs/2505.21979
- Hugging Face链接: https://huggingface.co/datasets/UBC-NLP/PEARL-X
引用信息
bibtex @article{Alwajih2025pearl, title={Pearl: A Multimodal Culturally-Aware {A}rabic Instruction Dataset}, author={Fakhraddin Alwajih and Samar M. Magdy and Abdellah El Mekki and Omer Nacar and Youssef Nafea and Safaa Taher Abdelfadil and Abdulfattah Mohammed Yahya and Hamzah Luqman and Nada Almarwani and Samah Aloufi and Baraah Qawasmeh and Houdaifa Atou and Serry Sibaee and Hamzah A. Alsayadi and Walid Al-Dhabyani and Maged S. Al-shaibani and Aya El aatar and Nour Qandos and Rahaf Alhamouri and Samar Ahmad and Razan Khassib and Lina Hamad and Mohammed Anwar AL-Ghrawi and Fatimah Alshamari and Cheikh Malainine and Doaa Qawasmeh and Aminetou Yacoub and Tfeil moilid and Ruwa AbuHweidi and Ahmed Aboeitta and Vatimetou Mohamed Lemin and Reem Abdel-Salam and Ahlam Bashiti and Adel Ammar and Aisha Alansari and Ahmed Ashraf and Nora Alturayeif and Sara Shatnawi and Alcides Alcoba Inciarte and AbdelRahim A. Elmadany and Mohamedou cheikh tourad and Ismail Berrada and Mustafa Jarrar and Shady Shehata and Muhammad Abdul-Mageed}, journal={arXiv preprint arXiv:2505.21979}, year={2025} }




