gagan3012/multilingual-llava-bench-in-the-wild

Name: gagan3012/multilingual-llava-bench-in-the-wild
Creator: gagan3012
Published: 2024-04-12 21:08:28
License: 暂无描述

Hugging Face2024-04-12 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/gagan3012/multilingual-llava-bench-in-the-wild

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: - config_name: ar features: - name: question_id dtype: int64 - name: image dtype: image - name: question dtype: string - name: caption dtype: string - name: image_id dtype: string - name: gpt_answer dtype: string - name: category dtype: string splits: - name: test num_bytes: 22342774.0 num_examples: 60 download_size: 9778993 dataset_size: 22342774.0 - config_name: arabic features: - name: question_id dtype: int64 - name: image dtype: image - name: question dtype: string - name: caption dtype: string - name: image_id dtype: string - name: gpt_answer dtype: string - name: category dtype: string splits: - name: test num_bytes: 22342774.0 num_examples: 60 download_size: 9778993 dataset_size: 22342774.0 - config_name: bengali features: - name: question_id dtype: int64 - name: image dtype: image - name: question dtype: string - name: caption dtype: string - name: image_id dtype: string - name: gpt_answer dtype: string - name: category dtype: string splits: - name: test num_bytes: 22378020.0 num_examples: 60 download_size: 9783130 dataset_size: 22378020.0 - config_name: chinese features: - name: question_id dtype: int64 - name: image dtype: image - name: question dtype: string - name: caption dtype: string - name: image_id dtype: string - name: gpt_answer dtype: string - name: category dtype: string splits: - name: test num_bytes: 22317502.0 num_examples: 60 download_size: 9772605 dataset_size: 22317502.0 - config_name: french features: - name: question_id dtype: int64 - name: image dtype: image - name: question dtype: string - name: caption dtype: string - name: image_id dtype: string - name: gpt_answer dtype: string - name: category dtype: string splits: - name: test num_bytes: 22327391.0 num_examples: 60 download_size: 9773783 dataset_size: 22327391.0 - config_name: hindi features: - name: question_id dtype: int64 - name: image dtype: image - name: question dtype: string - name: caption dtype: string - name: image_id dtype: string - name: gpt_answer dtype: string - name: category dtype: string splits: - name: test num_bytes: 22385129.0 num_examples: 60 download_size: 9799590 dataset_size: 22385129.0 - config_name: japanese features: - name: question_id dtype: int64 - name: image dtype: image - name: question dtype: string - name: caption dtype: string - name: image_id dtype: string - name: gpt_answer dtype: string - name: category dtype: string splits: - name: test num_bytes: 22333016.0 num_examples: 60 download_size: 9782382 dataset_size: 22333016.0 - config_name: russian features: - name: question_id dtype: int64 - name: image dtype: image - name: question dtype: string - name: caption dtype: string - name: image_id dtype: string - name: gpt_answer dtype: string - name: category dtype: string splits: - name: test num_bytes: 22355236.0 num_examples: 60 download_size: 9792575 dataset_size: 22355236.0 - config_name: spanish features: - name: question_id dtype: int64 - name: image dtype: image - name: question dtype: string - name: caption dtype: string - name: image_id dtype: string - name: gpt_answer dtype: string - name: category dtype: string splits: - name: test num_bytes: 22326471.0 num_examples: 60 download_size: 9781970 dataset_size: 22326471.0 - config_name: urdu features: - name: question_id dtype: int64 - name: image dtype: image - name: question dtype: string - name: caption dtype: string - name: image_id dtype: string - name: gpt_answer dtype: string - name: category dtype: string splits: - name: test num_bytes: 22349409.0 num_examples: 60 download_size: 9784751 dataset_size: 22349409.0 configs: - config_name: ar data_files: - split: test path: ar/test-* - config_name: arabic data_files: - split: test path: arabic/test-* - config_name: bengali data_files: - split: test path: bengali/test-* - config_name: chinese data_files: - split: test path: chinese/test-* - config_name: french data_files: - split: test path: french/test-* - config_name: hindi data_files: - split: test path: hindi/test-* - config_name: japanese data_files: - split: test path: japanese/test-* - config_name: russian data_files: - split: test path: russian/test-* - config_name: spanish data_files: - split: test path: spanish/test-* - config_name: urdu data_files: - split: test path: urdu/test-* ---

提供机构：

gagan3012

原始信息汇总

数据集概述

数据集配置及特征

配置名称: ar, arabic, bengali, chinese, french, hindi, japanese, russian, spanish, urdu
特征:
- question_id: 数据类型为 int64
- image: 数据类型为 image
- question: 数据类型为 string
- caption: 数据类型为 string
- image_id: 数据类型为 string
- gpt_answer: 数据类型为 string
- category: 数据类型为 string

数据集分割

分割名称: test
示例数量: 60
字节数:
- ar, arabic: 22342774.0
- bengali: 22378020.0
- chinese: 22317502.0
- french: 22327391.0
- hindi: 22385129.0
- japanese: 22333016.0
- russian: 22355236.0
- spanish: 22326471.0
- urdu: 22349409.0

数据集大小及下载大小

下载大小:
- ar, arabic: 9778993
- bengali: 9783130
- chinese: 9772605
- french: 9773783
- hindi: 9799590
- japanese: 9782382
- russian: 9792575
- spanish: 9781970
- urdu: 9784751
数据集大小: 与字节数相同

数据文件路径

配置名称: ar, arabic, bengali, chinese, french, hindi, japanese, russian, spanish, urdu
分割: test
路径:
- ar: ar/test-*
- arabic: arabic/test-*
- bengali: bengali/test-*
- chinese: chinese/test-*
- french: french/test-*
- hindi: hindi/test-*
- japanese: japanese/test-*
- russian: russian/test-*
- spanish: spanish/test-*
- urdu: urdu/test-*

搜集汇总

数据集介绍

构建方式

在视觉语言模型多语言评估领域，该数据集通过精心设计的流程构建而成。其核心方法涉及从多样化来源选取图像，并为每幅图像生成对应的描述性标题。随后，针对图像内容设计开放式问题，并利用先进的语言模型生成参考回答。整个过程确保了数据在多种语言间的平行对齐，涵盖阿拉伯语、孟加拉语、中文、法语、印地语、日语、俄语、西班牙语和乌尔都语等十个语言配置，每个配置均包含六十个测试样本，形成了结构严谨的多模态评估基准。

特点

该数据集最显著的特征在于其广泛的多语言覆盖与统一的多模态结构。每个样本均整合了图像、问题、自动生成的标题、类别标签以及由大型语言模型提供的参考答案，构成了一个自包含的评估单元。数据以标准化的特征字段组织，确保了跨语言版本间格式的一致性。这种设计使得研究者能够在统一的框架下，系统性地评估模型在不同语言和文化语境下的视觉理解与推理能力，为多语言视觉语言模型的发展提供了关键的基准测试工具。

使用方法

该数据集主要服务于多语言视觉语言模型的评估与基准测试。使用者可通过HuggingFace数据集库加载特定语言配置，例如‘chinese’或‘french’，以获取对应的测试集。每个样本提供的图像、问题及GPT生成的参考答案，可用于计算模型生成答案与参考回答之间的自动化评估指标，或进行人工质量评估。其结构化的设计便于集成到现有的评估流程中，为衡量模型在跨语言视觉问答任务上的性能提供了标准化、可复现的实验基础。

背景与挑战

背景概述

在人工智能迈向多模态与多语言融合的时代背景下，gagan3012/multilingual-llava-bench-in-the-wild数据集应运而生，旨在评估视觉语言模型在多样化真实场景中的跨语言理解与生成能力。该数据集由研究团队gagan3012构建，其核心研究问题聚焦于解决传统视觉问答模型在非英语语境下的性能局限，通过整合图像、多语言问题及生成式答案，推动多模态人工智能在全球化应用中的普适性发展。该数据集的创建标志着多语言视觉理解研究从单一语种向多元文化语境的重要拓展，为后续模型在跨语言泛化、文化适应性等方面的研究提供了关键基准。

当前挑战

该数据集致力于应对多语言视觉问答领域的双重挑战：在领域问题层面，模型需克服语言多样性带来的语义歧义与文化差异，确保对图像内容的理解在不同语言中保持一致性；同时，生成式答案的评估要求模型不仅识别视觉元素，还需进行连贯、准确的多语言自然语言生成。在构建过程中，挑战主要体现在多语言数据的高质量标注与对齐，包括图像描述、问题与答案在十种语言间的精准翻译与本土化适配，以及确保各语言分支在数据规模与复杂性上的均衡性，避免因语言资源不均导致的评估偏差。

常用场景

经典使用场景

在多模态人工智能领域，视觉语言模型的评估常受限于单一语言环境。该数据集通过整合图像、问题、标题及GPT生成的答案，并覆盖阿拉伯语、中文、法语等十种语言，为研究者提供了一个标准化的多语言视觉问答基准测试平台。其经典使用场景在于系统性地评估模型在不同语言和文化语境下的理解与生成能力，尤其适用于检验模型在跨语言视觉推理任务中的泛化性能。

实际应用

在全球化的数字服务中，多语言视觉理解技术具有广泛的应用前景。该数据集可直接服务于开发跨语言图像检索系统、多文化适配的智能助手以及无障碍交互界面。例如，在教育科技领域，它能助力构建支持多语言问答的视觉学习工具；在跨境电商中，可优化商品图像的多语言描述生成，提升用户体验。

衍生相关工作

围绕该数据集衍生的经典工作主要集中在多语言视觉语言模型的微调与评估框架创新上。研究者基于其构建了跨语言提示学习策略，并开发了针对低资源语言的适配器模块。同时，该数据集也催生了多项关于视觉问答中文化偏见检测与缓解的研究，推动了多模态伦理评估标准的发展。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集