floschne/multilingual-llava-bench-in-the-wild
收藏Hugging Face2024-05-16 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/floschne/multilingual-llava-bench-in-the-wild
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- ar
- bn
- zh
- en
- fr
- ru
- es
- ur
- hi
- ja
license: cc-by-4.0
size_categories:
- n<1K
pretty_name: Multilingual LLaVA Bench in the Wild
dataset_info:
features:
- name: image_id
dtype: string
- name: image
struct:
- name: bytes
dtype: binary
- name: path
dtype: 'null'
- name: image_caption
dtype: string
- name: question_id
dtype: int64
- name: question
dtype: string
- name: question_category
dtype: string
- name: gpt4_answer
dtype: string
- name: gpt4_model_id
dtype: string
splits:
- name: english
num_bytes: 131853762
num_examples: 60
- name: russian
num_bytes: 131895540
num_examples: 60
- name: hindi
num_bytes: 131932797
num_examples: 60
- name: bengali
num_bytes: 131926779
num_examples: 60
- name: chinese
num_bytes: 131847250
num_examples: 60
- name: spanish
num_bytes: 131858886
num_examples: 60
- name: japanese
num_bytes: 131867258
num_examples: 60
- name: arabic
num_bytes: 131880090
num_examples: 60
- name: french
num_bytes: 131860194
num_examples: 60
- name: urdu
num_bytes: 131888639
num_examples: 60
download_size: 515733256
dataset_size: 1318811195
configs:
- config_name: default
data_files:
- split: english
path: data/english-*
- split: russian
path: data/russian-*
- split: hindi
path: data/hindi-*
- split: bengali
path: data/bengali-*
- split: chinese
path: data/chinese-*
- split: spanish
path: data/spanish-*
- split: japanese
path: data/japanese-*
- split: arabic
path: data/arabic-*
- split: french
path: data/french-*
- split: urdu
path: data/urdu-*
---
# Multilingual LLaVA Bench in the Wild
### Note that this is a copy from https://huggingface.co/datasets/MBZUAI/multilingual-llava-bench-in-the-wild
It was created due to issues in the original repo. It also includes the image features and has a uniform and joined structure.
If you use this dataset, please cite the original authors:
```bibtex
@article{PALO2024,
title={Palo: A Large Multilingual Multimodal Language Model},
author={Maaz, Muhammad and Rasheed, Hanoona and Shaker, Abdelrahman and Khan, Salman and Cholakal, Hisham and Anwer, Rao M. and Baldwin, Tim and Felsberg, Michael and Khan, Fahad S.},
journal={arXiv 2402.14818},
year={2024},
url={https://arxiv.org/abs/2402.14818}
}
```
### How to load the image features
Due to a [bug](https://github.com/huggingface/datasets/issues/4796), the images cannot be stored as `PIL.Image.Image`s directly but needed to be converted to `dataset.Image`s-. Hence, to load them this additional step is required:
```python
from datasets import Image, load_dataset
ds = load_dataset("floschne/multilingual-llava-bench-in-the-wild", split="english")
ds = ds.map(lambda sample: {"image_t": Image().decode_example(sample["image"])}, remove_columns=["image"]).rename_column("image_t", "image")
```
提供机构:
floschne
原始信息汇总
数据集概述
基本信息
- 名称: Multilingual LLaVA Bench in the Wild
- 语言: 阿拉伯语 (ar), 孟加拉语 (bn), 中文 (zh), 英语 (en), 法语 (fr), 俄语 (ru), 西班牙语 (es), 乌尔都语 (ur), 印地语 (hi), 日语 (ja)
- 许可证: cc-by-4.0
- 大小分类: n<1K
数据集特征
- image_id: 字符串类型
- image: 结构化数据,包含
bytes(二进制类型) 和path(空类型) - image_caption: 字符串类型
- question_id: 整数类型 (int64)
- question: 字符串类型
- question_category: 字符串类型
- gpt4_answer: 字符串类型
- gpt4_model_id: 字符串类型
数据集分割
- english: 60个示例,总字节数131853762
- russian: 60个示例,总字节数131895540
- hindi: 60个示例,总字节数131932797
- bengali: 60个示例,总字节数131926779
- chinese: 60个示例,总字节数131847250
- spanish: 60个示例,总字节数131858886
- japanese: 60个示例,总字节数131867258
- arabic: 60个示例,总字节数131880090
- french: 60个示例,总字节数131860194
- urdu: 60个示例,总字节数131888639
数据集大小
- 下载大小: 515733256字节
- 数据集大小: 1318811195字节
配置
- config_name: default
- data_files:
- split: 不同语言的数据分割
- path: 对应语言数据的路径模式,如
data/english-*



