five

FreedomIntelligence/ALLaVA-4V-Arabic

收藏
Hugging Face2024-04-29 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/FreedomIntelligence/ALLaVA-4V-Arabic
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - question-answering - text-generation language: - ar tags: - GPT-4V - LVLM - Vision - Language size_categories: - 1M<n<10M configs: - config_name: allava_laion data_files: - split: caption path: "allava_laion/ALLaVA-Caption-LAION-4V_Arabic.json" # - split: instruct # path: "allava_laion/ALLaVA-Instruct-LAION-4V_Chinese.json" - config_name: allava_vflan data_files: - split: caption path: "allava_vflan/ALLaVA-Caption-VFLAN-4V_Arabic.json" # - split: instruct # path: "allava_vflan/ALLaVA-Instruct-VFLAN-4V_Chinese.json" # - config_name: allava_laion_instruction # data_files: "allava_laion/ALLaVA-Instruct-LAION-4V.json" # configs: # - config_name: default # data_files: # - split: allava_laion_caption # path: "allava_laion/ALLaVA-Caption-LAION-4V.json" # - split: allava_laion_instruction # path: "allava_laion/ALLaVA-Instruction-LAION-4V.json" # configs: # - config_name: default # - data_files: # - split: allava_laion_caption # - path: # - "allava_laion/ALLaVA-Caption-LAION-4V.json" # - split: allava_laion_instruction # - path: # - "allava_laion/ALLaVA-Instruction-LAION-4V.json" --- ## ALLaVA-4V for Arabic This is the Arabic version of the ALLaVA-4V data. We have translated the ALLaVA-4V data into Arabic through ChatGPT and instructed ChatGPT not to translate content related to OCR. The original dataset can be found [here](https://huggingface.co/datasets/FreedomIntelligence/ALLaVA-4V), and the image data can be downloaded from [ALLaVA-4V](https://huggingface.co/datasets/FreedomIntelligence/ALLaVA-4V). #### Citation If you find our data useful, please consider citing our work! We are FreedomIntelligence from Shenzhen Research Institute of Big Data and The Chinese University of Hong Kong, Shenzhen. ``` @misc{chen2024allava, title={ALLaVA: Harnessing GPT4V-synthesized Data for A Lite Vision-Language Model}, author={Guiming Hardy Chen and Shunian Chen and Ruifei Zhang and Junying Chen and Xiangbo Wu and Zhiyi Zhang and Zhihong Chen and Jianquan Li and Xiang Wan and Benyou Wang}, year={2024}, eprint={2402.11684}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```
提供机构:
FreedomIntelligence
原始信息汇总

数据集概述

基本信息

  • 许可证: Apache-2.0
  • 任务类别:
    • 问答
    • 文本生成
  • 语言: 阿拉伯语
  • 标签:
    • GPT-4V
    • LVLM
    • Vision
    • Language
  • 数据集大小: 1M<n<10M

配置详情

  • 配置名称: allava_laion

    • 数据文件:
      • 分割: caption
      • 路径: "allava_laion/ALLaVA-Caption-LAION-4V_Arabic.json"
  • 配置名称: allava_vflan

    • 数据文件:
      • 分割: caption
      • 路径: "allava_vflan/ALLaVA-Caption-VFLAN-4V_Arabic.json"

数据集描述

  • 版本: 阿拉伯语版本
  • 翻译说明: 通过ChatGPT翻译了ALLaVA-4V数据集,并指示ChatGPT不翻译与OCR相关的内容。

引用信息

@misc{chen2024allava, title={ALLaVA: Harnessing GPT4V-synthesized Data for A Lite Vision-Language Model}, author={Guiming Hardy Chen and Shunian Chen and Ruifei Zhang and Junying Chen and Xiangbo Wu and Zhiyi Zhang and Zhihong Chen and Jianquan Li and Xiang Wan and Benyou Wang}, year={2024}, eprint={2402.11684}, archivePrefix={arXiv}, primaryClass={cs.CL} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作