mlpc-lab/YTTB-VQA
收藏Hugging Face2023-11-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/mlpc-lab/YTTB-VQA
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- visual-question-answering
language:
- en
pretty_name: YTTB-VQA
size_categories:
- n<1K
license: cc-by-nc-4.0
---
# Dataset Card for Dataset Name
## Dataset Description
- **Homepage:** https://gordonhu608.github.io/bliva/
- **Repository:** https://github.com/mlpc-ucsd/BLIVA.git
- **Paper:**
- **Point of Contact:** w1hu@ucsd.edu
### Dataset Summary
The YTTB-VQA Dataset is a collection of 400 Youtube thumbnail question-answer pairs to evaluate the visual perception abilities of in-text images. It covers 11
categories, including technology, sports, entertainment, food, news, history, music, nature, cars, and education.
### Supported Tasks and Leaderboards
This dataset supports many tasks, including visual question answering, image captioning, etc.
### License
CC-By-NC-4.0
### Languages
The language of the data is primarily English.
## Getting Started
### Creating the dataset
Run the following command to download the images and create the dataset:
```python3 create_dataset.py```
You will find the images in `images_new` and the dataset in `youtube_new.json`.
## Dataset Structure
### Data Instances
A data instance in this dataset represents entries from a collection augmented by human-generated questions submitted to BLIVA. The answer is then entered into the answer field.
### Data Fields
**video_id:** a unique string representing a specific YouTube thumbnail image.<br>
**question:** representing a human-generated question.<br>
**video_classes:** representing a specific category for the YouTube thumbnail image.<br>
**answers:** This represents a ground truth answer for the question made about the YouTube thumbnail image.<br>
**video link** Representing the URL link for each YouTube video.
### Data Splits
The data are unsplit.
## Dataset Creation
### Source Data
#### Initial Data Collection and Normalization
We randomly selected YouTube videos with text-rich thumbnails from different categories during the data collection.
We recorded the unique video ID for each YouTube video and obtained the high-resolution thumbnail from the
URL ”http://img.youtube.com/vi/YouTube-Video-ID/maxresdefault.jpg”.
### Annotations
#### Annotation process
We created the annotation file with the following fields: ”video id,” question,” video classes,” answers,” and ”video link" in JSON format.
## Considerations for Using the Data
### Discussion of Biases
Although our dataset spans 11 categories, the ratio within each category varies. For example, 18% of the dataset pertains to education, while only 2% is dedicated to news.
### Acknowledgments
The youtube thumbnails dataset is purely for academic research and not for any monetary uses. For any of the authors who saw our dataset and found their thumbnail images used inappropriately, please get in touch with us directly by this email at w1hu@ucsd.edu and we will remove the image immediately.
提供机构:
mlpc-lab
原始信息汇总
数据集卡片 for YTTB-VQA
数据集描述
数据集概述
YTTB-VQA数据集是一个包含400个YouTube缩略图问答对的集合,用于评估内嵌图像的视觉感知能力。该数据集涵盖了11个类别,包括技术、体育、娱乐、食品、新闻、历史、音乐、自然、汽车和教育。
支持的任务和排行榜
该数据集支持多种任务,包括视觉问答、图像描述等。
许可证
CC-By-NC-4.0
语言
数据集的主要语言是英语。
数据集结构
数据实例
数据集中的每个实例代表从BLIVA提交的人工生成问题增强的集合中的条目。答案随后输入到答案字段中。
数据字段
- video_id: 表示特定YouTube缩略图图像的唯一字符串。
- question: 表示人工生成的问题。
- video_classes: 表示特定YouTube缩略图图像的类别。
- answers: 表示关于YouTube缩略图图像的问题的真实答案。
- video link: 表示每个YouTube视频的URL链接。
数据分割
数据未分割。
数据集创建
源数据
初始数据收集和规范化
我们在数据收集中随机选择了不同类别的文本丰富的YouTube视频,并记录了每个YouTube视频的唯一视频ID,从URL“http://img.youtube.com/vi/YouTube-Video-ID/maxresdefault.jpg”获取高分辨率缩略图。
标注
标注过程
我们创建了包含以下字段的标注文件:“video id”、“question”、“video classes”、“answers”和“video link”,格式为JSON。
使用数据时的考虑
偏差的讨论
尽管我们的数据集涵盖了11个类别,但每个类别内的比例各不相同。例如,18%的数据集涉及教育,而只有2%涉及新闻。
致谢
YouTube缩略图数据集纯粹用于学术研究,不用于任何金钱用途。如果任何作者发现我们的数据集中不恰当地使用了他们的缩略图图像,请直接通过电子邮件w1hu@ucsd.edu与我们联系,我们将立即删除该图像。



