five

nablasinc/Japanese_Video-QA

收藏
Hugging Face2026-02-24 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/nablasinc/Japanese_Video-QA
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - video-text-to-text - visual-question-answering language: - ja tags: - video - vqa - japanese - benchmark - multimodal size_categories: - n<1K dataset_info: features: - name: video_id dtype: string - name: question dtype: string - name: answer dtype: string - name: category dtype: string - name: choices sequence: string - name: type dtype: string - name: duration dtype: string - name: domain dtype: string - name: subdomain dtype: string - name: start_time dtype: string - name: end_time dtype: string - name: video_title dtype: string splits: - name: test num_bytes: 288904 num_examples: 800 download_size: 119051 dataset_size: 288904 configs: - config_name: default data_files: - split: test path: data/test-* --- # Japanese Video-QA ## Overview **Japanese Video-QA** is a video question-answering benchmark focused on Japanese cultural content, designed to evaluate multimodal large language models (MLLMs) on Japanese-specific videos. - **428 YouTube videos** → **800 QA pairs** - **6 domains, 100 sub-domains** covering Japanese culture - Questions generated by Gemini 2.5 Flash and manually verified - Evaluated with LLM-as-a-Judge (GPT-4o), scoring 1–3 **Authors:** 峯 悠大, 新立 拓也, 髙橋 和也 (NABLAS Inc.) ## Dataset Fields | Field | Type | Description | |---|---|---| | `video_id` | string | YouTube video ID | | `video_title` | string | Title of the video | | `domain` | string | Domain: 四季・行事 / 観光名所 / 伝統文化 / 食文化 / 自然・風景 / ポップカルチャー | | `subdomain` | string | Sub-domain (100 categories) | | `duration` | string | `short` (< 4 min) or `medium` (4–20 min) | | `question` | string | Question in Japanese | | `answer` | string | Ground-truth answer (choice label for `multi_choice`) | | `type` | string | `open` / `multi_choice` / `yes_no` | | `category` | string | `spatial` / `count` / `action` / `temporal` / `causal` | | `choices` | list[string] | Answer options (only for `multi_choice`) | | `start_time` | string | Segment start time (**reference only**) | | `end_time` | string | Segment end time (**reference only**) | > `start_time` and `end_time` are approximate annotations for reference. Models should take the full video as input. ## Benchmark Results Scoring: 1 = incorrect, 2 = partial, 3 = correct LLM judge: GPT-4o ([sample code](https://github.com/nablas-inc/Japanese_Video-QA/blob/main/llm_judge_sample.py)) | Model | Avg | Score 1 | Score 2 | Score 3 | |---|---|---|---|---| | Gemini 3 Pro | **2.61** | 122 | 68 | 610 | | Gemini 2.5 Flash | 2.57 | 139 | 68 | 593 | | Qwen3-VL-8B-Instruct | 2.24 | 260 | 89 | 451 | | Qwen3-VL-8B-Thinking | 2.20 | 285 | 70 | 445 | | Qwen3-VL-4B-Instruct | 2.19 | 287 | 77 | 436 | | Qwen3-VL-4B-Thinking | 2.13 | 308 | 77 | 415 | | Phi-4-multimodal-instruct | 1.74 | 465 | 76 | 259 | ## Usage ```python from datasets import load_dataset dataset = load_dataset("nablasinc/Japanese_Video-QA") test = dataset["test"] ``` Use `video_id` to retrieve the YouTube video and provide the full video as model input. ## License and Copyright The annotations and metadata in this dataset are released under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/). The videos referenced in this dataset are YouTube videos, and **all copyrights are retained by the respective video uploaders**. No video content is redistributed — only YouTube video IDs and metadata are provided. **This dataset is intended for research purposes only.** ## Citation ```bibtex @dataset{japanese_video_qa_2026, title = {Japanese Video-QA: A Benchmark for Evaluating Video Understanding of Japanese Culture}, author = {Mine, Yudai and Shintate, Takuya and Takahashi, Kazuya}, year = {2026}, url = {https://huggingface.co/datasets/nablasinc/Japanese_Video-QA} } ```
提供机构:
nablasinc
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作