nablasinc/Japanese_Video-QA
收藏Hugging Face2026-02-24 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/nablasinc/Japanese_Video-QA
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- video-text-to-text
- visual-question-answering
language:
- ja
tags:
- video
- vqa
- japanese
- benchmark
- multimodal
size_categories:
- n<1K
dataset_info:
features:
- name: video_id
dtype: string
- name: question
dtype: string
- name: answer
dtype: string
- name: category
dtype: string
- name: choices
sequence: string
- name: type
dtype: string
- name: duration
dtype: string
- name: domain
dtype: string
- name: subdomain
dtype: string
- name: start_time
dtype: string
- name: end_time
dtype: string
- name: video_title
dtype: string
splits:
- name: test
num_bytes: 288904
num_examples: 800
download_size: 119051
dataset_size: 288904
configs:
- config_name: default
data_files:
- split: test
path: data/test-*
---
# Japanese Video-QA
## Overview
**Japanese Video-QA** is a video question-answering benchmark focused on Japanese cultural content, designed to evaluate multimodal large language models (MLLMs) on Japanese-specific videos.
- **428 YouTube videos** → **800 QA pairs**
- **6 domains, 100 sub-domains** covering Japanese culture
- Questions generated by Gemini 2.5 Flash and manually verified
- Evaluated with LLM-as-a-Judge (GPT-4o), scoring 1–3
**Authors:** 峯 悠大, 新立 拓也, 髙橋 和也 (NABLAS Inc.)
## Dataset Fields
| Field | Type | Description |
|---|---|---|
| `video_id` | string | YouTube video ID |
| `video_title` | string | Title of the video |
| `domain` | string | Domain: 四季・行事 / 観光名所 / 伝統文化 / 食文化 / 自然・風景 / ポップカルチャー |
| `subdomain` | string | Sub-domain (100 categories) |
| `duration` | string | `short` (< 4 min) or `medium` (4–20 min) |
| `question` | string | Question in Japanese |
| `answer` | string | Ground-truth answer (choice label for `multi_choice`) |
| `type` | string | `open` / `multi_choice` / `yes_no` |
| `category` | string | `spatial` / `count` / `action` / `temporal` / `causal` |
| `choices` | list[string] | Answer options (only for `multi_choice`) |
| `start_time` | string | Segment start time (**reference only**) |
| `end_time` | string | Segment end time (**reference only**) |
> `start_time` and `end_time` are approximate annotations for reference. Models should take the full video as input.
## Benchmark Results
Scoring: 1 = incorrect, 2 = partial, 3 = correct
LLM judge: GPT-4o ([sample code](https://github.com/nablas-inc/Japanese_Video-QA/blob/main/llm_judge_sample.py))
| Model | Avg | Score 1 | Score 2 | Score 3 |
|---|---|---|---|---|
| Gemini 3 Pro | **2.61** | 122 | 68 | 610 |
| Gemini 2.5 Flash | 2.57 | 139 | 68 | 593 |
| Qwen3-VL-8B-Instruct | 2.24 | 260 | 89 | 451 |
| Qwen3-VL-8B-Thinking | 2.20 | 285 | 70 | 445 |
| Qwen3-VL-4B-Instruct | 2.19 | 287 | 77 | 436 |
| Qwen3-VL-4B-Thinking | 2.13 | 308 | 77 | 415 |
| Phi-4-multimodal-instruct | 1.74 | 465 | 76 | 259 |
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("nablasinc/Japanese_Video-QA")
test = dataset["test"]
```
Use `video_id` to retrieve the YouTube video and provide the full video as model input.
## License and Copyright
The annotations and metadata in this dataset are released under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/).
The videos referenced in this dataset are YouTube videos, and **all copyrights are retained by the respective video uploaders**. No video content is redistributed — only YouTube video IDs and metadata are provided. **This dataset is intended for research purposes only.**
## Citation
```bibtex
@dataset{japanese_video_qa_2026,
title = {Japanese Video-QA: A Benchmark for Evaluating Video Understanding of Japanese Culture},
author = {Mine, Yudai and Shintate, Takuya and Takahashi, Kazuya},
year = {2026},
url = {https://huggingface.co/datasets/nablasinc/Japanese_Video-QA}
}
```
提供机构:
nablasinc



