floschne/marvl
收藏Hugging Face2024-05-16 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/floschne/marvl
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- id
- sw
- ta
- tr
- zh
- en
license: cc-by-4.0
size_categories:
- 1K<n<10K
task_categories:
- visual-question-answering
pretty_name: MaRVL
dataset_info:
features:
- name: id
dtype: string
- name: hypothesis
dtype: string
- name: hypo_en
dtype: string
- name: language
dtype: string
- name: label
dtype: bool
- name: chapter
dtype: string
- name: concept
dtype: string
- name: annotator_info
struct:
- name: age
dtype: int64
- name: annotator_id
dtype: string
- name: country_of_birth
dtype: string
- name: country_of_residence
dtype: string
- name: gender
dtype: string
- name: left_img_id
dtype: string
- name: right_img_id
dtype: string
- name: left_img
struct:
- name: bytes
dtype: binary
- name: path
dtype: 'null'
- name: right_img
struct:
- name: bytes
dtype: binary
- name: path
dtype: 'null'
- name: resized_left_img
struct:
- name: bytes
dtype: binary
- name: path
dtype: 'null'
- name: resized_right_img
struct:
- name: bytes
dtype: binary
- name: path
dtype: 'null'
- name: vertically_stacked_img
struct:
- name: bytes
dtype: binary
- name: path
dtype: 'null'
- name: horizontally_stacked_img
struct:
- name: bytes
dtype: binary
- name: path
dtype: 'null'
splits:
- name: id
num_bytes: 2079196646
num_examples: 1128
- name: sw
num_bytes: 899838181
num_examples: 1108
- name: ta
num_bytes: 801784098
num_examples: 1242
- name: tr
num_bytes: 1373652829
num_examples: 1180
- name: zh
num_bytes: 1193602152
num_examples: 1012
download_size: 6234764237
dataset_size: 6348073906
configs:
- config_name: default
data_files:
- split: id
path: data/id-*
- split: sw
path: data/sw-*
- split: ta
path: data/ta-*
- split: tr
path: data/tr-*
- split: zh
path: data/zh-*
---
# MaRVL
### This is a copy from the original repo: https://github.com/marvl-challenge/marvl-code
If you use this dataset, please cite the original authors:
```bibtex
@inproceedings{liu-etal-2021-visually,
title = "Visually Grounded Reasoning across Languages and Cultures",
author = "Liu, Fangyu and
Bugliarello, Emanuele and
Ponti, Edoardo Maria and
Reddy, Siva and
Collier, Nigel and
Elliott, Desmond",
booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2021",
address = "Online and Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.emnlp-main.818",
pages = "10467--10485",
}
```
### Additional data
In addition to the data available in the original repo, this dataset contains the following columns
* `en_translation` --> English translation of the `hypothesis` created using Bing Translate
* `left_img` --> PIL Image
* `right_img`--> PIL Image
* `resized_left_img` --> PIL Image resized
* `resized_right_img` --> PIL Image resized
* `vertically_stacked_img` --> PIL image that contains the left and right resized images stacked vertically with a black gutter of `10px`
* `horizontally_stacked_img` --> PIL image that contains the left and right resized images stacked horizontally with a black gutter of `10px`
The images were resized using [`img2dataset`](https://github.com/rom1504/img2dataset/blob/main/img2dataset/resizer.py):
<details>
<summary>Show code snippet</summary>
```python
Resizer(
image_size=640,
resize_mode=ResizeMode.keep_ratio,
resize_only_if_bigger=True,
)
```
</details>
### How to read the images
Due to a [bug](https://github.com/huggingface/datasets/issues/4796), the images cannot be stored as PIL.Image.Images directly but need to be converted to dataset.Images-. Hence, to load them, this additional step is required:
```python
from datasets import Image, load_dataset
ds = load_dataset("floschne/marvl", split="sw")
ds.map(
lambda sample: {
"left_img_t": [Image().decode_example(img) for img in sample["left_img"]],
"right_img_t": [Image().decode_example(img) for img in sample["right_img"]],
"resized_left_img_t": [
Image().decode_example(img) for img in sample["resized_left_img"]
],
"resized_right_img_t": [
Image().decode_example(img) for img in sample["resized_right_img"]
],
"vertically_stacked_img_t": [
Image().decode_example(img) for img in sample["vertically_stacked_img"]
],
"horizontally_stacked_img_t": [
Image().decode_example(img) for img in sample["horizontally_stacked_img"]
],
},
remove_columns=[
"left_img",
"right_img",
"resized_left_img",
"resized_right_img",
"vertically_stacked_img",
"horizontally_stacked_img",
],
).rename_columns(
{
"left_img_t": "left_img",
"right_img_t": "right_img",
"resized_left_img_t": "resized_left_img",
"resized_right_img_t": "resized_right_img",
"vertically_stacked_img_t": "vertically_stacked_img",
"horizontally_stacked_img_t": "horizontally_stacked_img",
}
)
```
提供机构:
floschne
原始信息汇总
数据集概述
数据集名称
- MaRVL
数据集语言
- 支持的语言包括:印尼语(id)、斯瓦希里语(sw)、泰米尔语(ta)、土耳其语(tr)、中文(zh)和英语(en)。
许可证
- CC-BY-4.0
数据集大小
- 下载大小:6234764237字节
- 数据集大小:6348073906字节
任务类别
- 视觉问答(visual-question-answering)
数据集特征
- 基本特征:
- id: 字符串类型
- hypothesis: 字符串类型
- hypo_en: 字符串类型
- language: 字符串类型
- label: 布尔类型
- chapter: 字符串类型
- concept: 字符串类型
- 注释者信息:
- age: 整数类型
- annotator_id: 字符串类型
- country_of_birth: 字符串类型
- country_of_residence: 字符串类型
- gender: 字符串类型
- 图像相关特征:
- left_img_id: 字符串类型
- right_img_id: 字符串类型
- left_img: 包含bytes(二进制类型)和path(空类型)
- right_img: 包含bytes(二进制类型)和path(空类型)
- resized_left_img: 包含bytes(二进制类型)和path(空类型)
- resized_right_img: 包含bytes(二进制类型)和path(空类型)
- vertically_stacked_img: 包含bytes(二进制类型)和path(空类型)
- horizontally_stacked_img: 包含bytes(二进制类型)和path(空类型)
数据集分割
- 分割详情:
- id: 1128个样本,2079196646字节
- sw: 1108个样本,899838181字节
- ta: 1242个样本,801784098字节
- tr: 1180个样本,1373652829字节
- zh: 1012个样本,1193602152字节
配置信息
- 默认配置:
- 数据文件路径根据语言分割,如
data/id-*、data/sw-*等。
- 数据文件路径根据语言分割,如
图像处理
- 图像使用
img2dataset工具进行处理,保持比例并仅在图像大于所需尺寸时进行调整。
图像加载方法
- 由于技术限制,图像需通过特定代码转换后才能加载,具体转换方法见README文件中的代码示例。
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是一个多语言视觉问答数据集,包含印尼语、斯瓦希里语、泰米尔语等多种语言的文本和图像数据,支持视觉问答任务。数据集提供了多种图像格式,包括原始图像、调整大小后的图像以及堆叠图像,便于进行多模态研究。
以上内容由遇见数据集搜集并总结生成



