FreedomIntelligence/ALLaVA-4V-Arabic
收藏Hugging Face2024-04-29 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/FreedomIntelligence/ALLaVA-4V-Arabic
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- question-answering
- text-generation
language:
- ar
tags:
- GPT-4V
- LVLM
- Vision
- Language
size_categories:
- 1M<n<10M
configs:
- config_name: allava_laion
data_files:
- split: caption
path: "allava_laion/ALLaVA-Caption-LAION-4V_Arabic.json"
# - split: instruct
# path: "allava_laion/ALLaVA-Instruct-LAION-4V_Chinese.json"
- config_name: allava_vflan
data_files:
- split: caption
path: "allava_vflan/ALLaVA-Caption-VFLAN-4V_Arabic.json"
# - split: instruct
# path: "allava_vflan/ALLaVA-Instruct-VFLAN-4V_Chinese.json"
# - config_name: allava_laion_instruction
# data_files: "allava_laion/ALLaVA-Instruct-LAION-4V.json"
# configs:
# - config_name: default
# data_files:
# - split: allava_laion_caption
# path: "allava_laion/ALLaVA-Caption-LAION-4V.json"
# - split: allava_laion_instruction
# path: "allava_laion/ALLaVA-Instruction-LAION-4V.json"
# configs:
# - config_name: default
# - data_files:
# - split: allava_laion_caption
# - path:
# - "allava_laion/ALLaVA-Caption-LAION-4V.json"
# - split: allava_laion_instruction
# - path:
# - "allava_laion/ALLaVA-Instruction-LAION-4V.json"
---
## ALLaVA-4V for Arabic
This is the Arabic version of the ALLaVA-4V data. We have translated the ALLaVA-4V data into Arabic through ChatGPT and instructed ChatGPT not to translate content related to OCR.
The original dataset can be found [here](https://huggingface.co/datasets/FreedomIntelligence/ALLaVA-4V), and the image data can be downloaded from [ALLaVA-4V](https://huggingface.co/datasets/FreedomIntelligence/ALLaVA-4V).
#### Citation
If you find our data useful, please consider citing our work! We are FreedomIntelligence from Shenzhen Research Institute of Big Data and The Chinese University of Hong Kong, Shenzhen.
```
@misc{chen2024allava,
title={ALLaVA: Harnessing GPT4V-synthesized Data for A Lite Vision-Language Model},
author={Guiming Hardy Chen and Shunian Chen and Ruifei Zhang and Junying Chen and Xiangbo Wu and Zhiyi Zhang and Zhihong Chen and Jianquan Li and Xiang Wan and Benyou Wang},
year={2024},
eprint={2402.11684},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
提供机构:
FreedomIntelligence
原始信息汇总
数据集概述
基本信息
- 许可证: Apache-2.0
- 任务类别:
- 问答
- 文本生成
- 语言: 阿拉伯语
- 标签:
- GPT-4V
- LVLM
- Vision
- Language
- 数据集大小: 1M<n<10M
配置详情
-
配置名称: allava_laion
- 数据文件:
- 分割: caption
- 路径: "allava_laion/ALLaVA-Caption-LAION-4V_Arabic.json"
- 数据文件:
-
配置名称: allava_vflan
- 数据文件:
- 分割: caption
- 路径: "allava_vflan/ALLaVA-Caption-VFLAN-4V_Arabic.json"
- 数据文件:
数据集描述
- 版本: 阿拉伯语版本
- 翻译说明: 通过ChatGPT翻译了ALLaVA-4V数据集,并指示ChatGPT不翻译与OCR相关的内容。
引用信息
@misc{chen2024allava, title={ALLaVA: Harnessing GPT4V-synthesized Data for A Lite Vision-Language Model}, author={Guiming Hardy Chen and Shunian Chen and Ruifei Zhang and Junying Chen and Xiangbo Wu and Zhiyi Zhang and Zhihong Chen and Jianquan Li and Xiang Wan and Benyou Wang}, year={2024}, eprint={2402.11684}, archivePrefix={arXiv}, primaryClass={cs.CL} }



