Viet-ViTextVQA-gemini-VQA
收藏魔搭社区2025-11-07 更新2025-01-11 收录
下载链接:
https://modelscope.cn/datasets/5CD-AI/Viet-ViTextVQA-gemini-VQA
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Overview
This dataset is was created from **9594** Vietnamese 🇻🇳 images in train split of dataset [ViTextVQA](https://arxiv.org/abs/2404.10652?fbclid=IwZXh0bgNhZW0CMTAAAR3PpvukmV1HQiByBngZmjfx-vfhgt0OGzKhHbvBPS8LmBzy3G-Wav3xDjg_aem_AXMs29kzZ0aAbsCvMwGfFE9torfSkfdW1FwE0Rv5cXgf1nlC9w4Q5xkTw5n8suGdz0faCfh8nQnauNrrr3-X9X0m) [1]. Each image has been analyzed and annotated using advanced Visual Question Answering (VQA) techniques to produce a comprehensive dataset.
There is a set of over **50,000** detailed descriptions and query-based questions and answers generated by the Gemini 1.5 Flash model, currently Google's leading model on the [WildVision Arena Leaderboard](https://huggingface.co/spaces/WildVision/vision-arena). This results in a richly annotated dataset, ideal for various educational and research applications.
Please give us a like ❤️ if you find it useful !
<div align="center">
<img src="https://i.pinimg.com/736x/fa/87/81/fa8781ab467deed28b368d0ec7b32674.jpg" width="400"/>
</div>
Description
```
Bức ảnh chụp một tảng đá lớn, được khắc chữ vàng ghi dòng chữ “Sở chỉ huy chiến dịch Điện Biên Phủ 31/1/1954 - 15/5/1954”.
Phía dưới dòng chữ tiếng Việt là dòng chữ tiếng Anh “HEADQUATER OF DIEN BIEN PHU CAMPAIGN 31/1/1954 - 15/5/1954”.
Hình ảnh được chụp từ phía trước, tập trung vào tảng đá, nền là cây xanh.```
```
QnA
```
[
{ "role": "user", "content": "Bức ảnh là di tích gì ?" },
{ "role": "assistant", "content": "Bức ảnh là di tích Sở chỉ huy chiến dịch Điện Biên Phủ." },
{ "role": "user", "content": "Thời gian của chiến dịch Điện Biên Phủ ?" },
{ "role": "assistant", "content": "Thời gian của chiến dịch Điện Biên Phủ là từ 31 / 1 / 1954 - 15 / 5 / 1954." }
]
```
<div style="display: grid; grid-template-columns: repeat(8, 150px); grid-gap: 10px;">
<img src="https://i.pinimg.com/736x/21/e8/ef/21e8efd779e8025d29fb430e435273c8.jpg" style="width: 150px;">
<img src="https://i.pinimg.com/736x/6f/63/e9/6f63e91e8b5751d5df21641fec8f8f15.jpg" style="width: 150px;">
<img src="https://i.pinimg.com/736x/18/f9/2b/18f92b6ae750e4b8eaa8a74df8cbe92a.jpg" style="width: 150px;">
<img src="https://i.pinimg.com/736x/8f/21/d5/8f21d587621c52e925c32af565dca5d1.jpg" style="width: 150px;">
<img src="https://i.pinimg.com/736x/83/be/b2/83beb2b9799001f78e1beea0381417e4.jpg" style="width: 150px;">
</div>
# Cite
```
@misc{doan2024vintern1befficientmultimodallarge,
title={Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese},
author={Khang T. Doan and Bao G. Huynh and Dung T. Hoang and Thuc D. Pham and Nhat H. Pham and Quan T. M. Nguyen and Bang Q. Vo and Suong N. Hoang},
year={2024},
eprint={2408.12480},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2408.12480},
}
```
# References
[1] Van Nguyen, Q., Tran, D.Q., Pham, H.Q., Nguyen, T.K.B., Nguyen, N.H., Van Nguyen, K. and Nguyen, N.L.T., 2024. ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images. arXiv preprint arXiv:2404.10652.
# 数据集概览
本数据集源自数据集[ViTextVQA](https://arxiv.org/abs/2404.10652?fbclid=IwZXh0bgNhZW0CMTAAAR3PpvukmV1HQiByBngZmjfx-vfhgt0OGzKhHbvBPS8LmBzy3G-Wav3xDjg_aem_AXMs29kzZ0aAbsCvMwGfFE9torfSkfdW1FwE0Rv5cXgf1nlC9w4Q5xkTw5n8suGdz0faCfh8nQnauNrrr3-X9X0m) [1]的训练划分集中的9594张越南语图像。针对每张图像均采用先进的视觉问答(Visual Question Answering, VQA)技术进行分析与标注,最终构建得到该全面丰富的数据集。
本数据集包含由Gemini 1.5 Flash模型生成的超50000条详细描述、基于查询的问答对;Gemini 1.5 Flash是谷歌当前在[WildVision竞技场排行榜](https://huggingface.co/spaces/WildVision/vision-arena)上的领先模型。本数据集标注充分,适用于各类教育与研究场景。
若本数据集对您有所帮助,恳请点赞支持 ❤️!
<div align="center">
<img src="https://i.pinimg.com/736x/fa/87/81/fa8781ab467deed28b368d0ec7b32674.jpg" width="400"/>
</div>
### 图像描述
该图像拍摄于一块大型石碑,碑上刻有金色文字,内容为“奠边府战役指挥部 1954年1月31日—1954年5月15日”。
越南语文字下方配有英文文字“HEADQUARTERS OF DIEN BIEN PHU CAMPAIGN 31/1/1954 - 15/5/1954”。
图像从正面拍摄,主体为石碑,背景为绿植。
### 问答示例
[
{ "role": "user", "content": "该图像中的遗迹是什么?" },
{ "role": "assistant", "content": "该图像中的遗迹是奠边府战役指挥部旧址。"},
{ "role": "user", "content": "奠边府战役的时间是什么时候?" },
{ "role": "assistant", "content": "奠边府战役的时间为1954年1月31日至1954年5月15日。"}
]
<div style="display: grid; grid-template-columns: repeat(8, 150px); grid-gap: 10px;">
<img src="https://i.pinimg.com/736x/21/e8/ef/21e8efd779e8025d29fb430e435273c8.jpg" style="width: 150px;">
<img src="https://i.pinimg.com/736x/6f/63/e9/6f63e91e8b5751d5df21641fec8f8f15.jpg" style="width: 150px;">
<img src="https://i.pinimg.com/736x/18/f9/2b/18f92b6ae750e4b8eaa8a74df8cbe92a.jpg" style="width: 150px;">
<img src="https://i.pinimg.com/736x/8f/21/d5/8f21d587621c52e925c32af565dca5d1.jpg" style="width: 150px;">
<img src="https://i.pinimg.com/736x/83/be/b2/83beb2b9799001f78e1beea0381417e4.jpg" style="width: 150px;">
</div>
# 引用
@misc{doan2024vintern1befficientmultimodallarge,
title={Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese},
author={Khang T. Doan and Bao G. Huynh and Dung T. Hoang and Thuc D. Pham and Nhat H. Pham and Quan T. M. Nguyen and Bang Q. Vo and Suong N. Hoang},
year={2024},
eprint={2408.12480},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2408.12480},
}
# 参考文献
[1] Van Nguyen, Q., Tran, D.Q., Pham, H.Q., Nguyen, T.K.B., Nguyen, N.H., Van Nguyen, K. 及 Nguyen, N.L.T., 2024. ViTextVQA: 用于评估图像中越南语文本理解能力的大规模视觉问答数据集. arXiv预印本arXiv:2404.10652.
提供机构:
maas
创建时间:
2025-01-08



