five

floschne/multilingual-llava-bench-in-the-wild

收藏
Hugging Face2024-05-16 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/floschne/multilingual-llava-bench-in-the-wild
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - ar - bn - zh - en - fr - ru - es - ur - hi - ja license: cc-by-4.0 size_categories: - n<1K pretty_name: Multilingual LLaVA Bench in the Wild dataset_info: features: - name: image_id dtype: string - name: image struct: - name: bytes dtype: binary - name: path dtype: 'null' - name: image_caption dtype: string - name: question_id dtype: int64 - name: question dtype: string - name: question_category dtype: string - name: gpt4_answer dtype: string - name: gpt4_model_id dtype: string splits: - name: english num_bytes: 131853762 num_examples: 60 - name: russian num_bytes: 131895540 num_examples: 60 - name: hindi num_bytes: 131932797 num_examples: 60 - name: bengali num_bytes: 131926779 num_examples: 60 - name: chinese num_bytes: 131847250 num_examples: 60 - name: spanish num_bytes: 131858886 num_examples: 60 - name: japanese num_bytes: 131867258 num_examples: 60 - name: arabic num_bytes: 131880090 num_examples: 60 - name: french num_bytes: 131860194 num_examples: 60 - name: urdu num_bytes: 131888639 num_examples: 60 download_size: 515733256 dataset_size: 1318811195 configs: - config_name: default data_files: - split: english path: data/english-* - split: russian path: data/russian-* - split: hindi path: data/hindi-* - split: bengali path: data/bengali-* - split: chinese path: data/chinese-* - split: spanish path: data/spanish-* - split: japanese path: data/japanese-* - split: arabic path: data/arabic-* - split: french path: data/french-* - split: urdu path: data/urdu-* --- # Multilingual LLaVA Bench in the Wild ### Note that this is a copy from https://huggingface.co/datasets/MBZUAI/multilingual-llava-bench-in-the-wild It was created due to issues in the original repo. It also includes the image features and has a uniform and joined structure. If you use this dataset, please cite the original authors: ```bibtex @article{PALO2024, title={Palo: A Large Multilingual Multimodal Language Model}, author={Maaz, Muhammad and Rasheed, Hanoona and Shaker, Abdelrahman and Khan, Salman and Cholakal, Hisham and Anwer, Rao M. and Baldwin, Tim and Felsberg, Michael and Khan, Fahad S.}, journal={arXiv 2402.14818}, year={2024}, url={https://arxiv.org/abs/2402.14818} } ``` ### How to load the image features Due to a [bug](https://github.com/huggingface/datasets/issues/4796), the images cannot be stored as `PIL.Image.Image`s directly but needed to be converted to `dataset.Image`s-. Hence, to load them this additional step is required: ```python from datasets import Image, load_dataset ds = load_dataset("floschne/multilingual-llava-bench-in-the-wild", split="english") ds = ds.map(lambda sample: {"image_t": Image().decode_example(sample["image"])}, remove_columns=["image"]).rename_column("image_t", "image") ```
提供机构:
floschne
原始信息汇总

数据集概述

基本信息

  • 名称: Multilingual LLaVA Bench in the Wild
  • 语言: 阿拉伯语 (ar), 孟加拉语 (bn), 中文 (zh), 英语 (en), 法语 (fr), 俄语 (ru), 西班牙语 (es), 乌尔都语 (ur), 印地语 (hi), 日语 (ja)
  • 许可证: cc-by-4.0
  • 大小分类: n<1K

数据集特征

  • image_id: 字符串类型
  • image: 结构化数据,包含 bytes (二进制类型) 和 path (空类型)
  • image_caption: 字符串类型
  • question_id: 整数类型 (int64)
  • question: 字符串类型
  • question_category: 字符串类型
  • gpt4_answer: 字符串类型
  • gpt4_model_id: 字符串类型

数据集分割

  • english: 60个示例,总字节数131853762
  • russian: 60个示例,总字节数131895540
  • hindi: 60个示例,总字节数131932797
  • bengali: 60个示例,总字节数131926779
  • chinese: 60个示例,总字节数131847250
  • spanish: 60个示例,总字节数131858886
  • japanese: 60个示例,总字节数131867258
  • arabic: 60个示例,总字节数131880090
  • french: 60个示例,总字节数131860194
  • urdu: 60个示例,总字节数131888639

数据集大小

  • 下载大小: 515733256字节
  • 数据集大小: 1318811195字节

配置

  • config_name: default
  • data_files:
    • split: 不同语言的数据分割
    • path: 对应语言数据的路径模式,如 data/english-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作