ALM-Bench
收藏All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
数据集概述
- 名称: All Languages Matter Benchmark (ALM-Bench)
- 描述: 一个用于评估多语言多模态模型在100种文化多样语言上的表现的数据集。
- 语言数量: 100种语言
- 问题-答案对数量: 22,763对
- 类别数量: 19个类别
- 问题类型: 包括选择题、判断题、短答案和长答案
数据集特点
- 多语言多模态: 涵盖100种语言,评估模型在多语言环境下的表现。
- 文化多样性: 包含13个文化方面的内容,如遗产、习俗、建筑、文学、音乐和体育。
- 低资源语言: 特别关注低资源语言,确保模型在不同语言资源下的表现。
- 广泛的地理覆盖: 涵盖73个国家,跨越五大洲和24种不同的文字。
数据集结构
-
文件结构:
ALM-Bench/ |–– Swedish/ | |–– Religion | |–– Culture | |–– Heritage | |–– ... # 剩余类别 ... # 剩余语言
-
数据字段:
file_name: 文件名ID: 唯一ID,格式为language#_cat#_img#Language: 语言Category: 类别Question_Type: 问题类型English_Question: 英文问题English_Answer: 英文答案Translated_Question: 本地语言翻译的问题Translated_Answer: 本地语言翻译的答案Image_Url: 图片URL
数据集下载
- 下载地址: Hugging Face
引用
bibtex @misc{vayani2024alm, title={All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages}, author={Ashmal Vayani and Dinura Dissanayake and Hasindri Watawana and Noor Ahsan and Nevasini Sasikumar and Omkar Thawakar and Henok Biadglign Ademtew and Yahya Hmaiti and Amandeep Kumar and Kartik Kuckreja and Mykola Maslych and Wafa Al Ghallabi and Mihail Mihaylov and Chao Qin and Abdelrahman M Shaker and Mike Zhang and Mahardika Krisna Ihsani and Amiel Esplana and Monil Gokani and Shachar Mirkin and Harsh Singh and Ashay Srivastava and Endre Hamerlik and Fathinah Asma Izzati and Fadillah Adamsyah Maani and Sebastian Cavada and Jenny Chim and Rohit Gupta and Sanjay Manjunath and Kamila Zhumakhanova and Feno Heriniaina Rabevohitra and Azril Amirudin and Muhammad Ridzuan and Daniya Kareem and Ketan More and Kunyang Li and Pramesh Shakya and Muhammad Saad and Amirpouya Ghasemaghaei and Amirbek Djanibekov and Dilshod Azizov and Branislava Jankovic and Naman Bhatia and Alvaro Cabrera and Johan Obando-Ceron and Olympiah Otieno and Fabian Farestam and Muztoba Rabbani and Sanoojan Baliah and Santosh Sanjeev and Abduragim Shtanchaev and Maheen Fatima and Thao Nguyen and Amrin Kareem and Toluwani Aremu and Nathan Xavier and Amit Bhatkal and Hawau Toyin and Aman Chadha and Hisham Cholakkal and Rao Muhammad Anwer and Michael Felsberg and Jorma Laaksonen and Thamar Solorio and Monojit Choudhury and Ivan Laptev and Mubarak Shah and Salman Khan and Fahad Khan}, year={2024}, eprint={2411.16508}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2411.16508}, }




