okib/Kvasir-VQA-x1

Name: okib/Kvasir-VQA-x1
Creator: okib
Published: 2026-04-18 06:35:30
License: 暂无描述

Hugging Face2026-04-18 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/okib/Kvasir-VQA-x1

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-4.0 task_categories: - visual-question-answering - image-text-to-text language: - en tags: - medical - kvasir dataset_info: features: - name: image dtype: string - name: complexity dtype: int64 - name: question dtype: string - name: answer dtype: string - name: original dtype: string - name: question_class list: string - name: img_id dtype: string splits: - name: train num_bytes: 71692651 num_examples: 143594 - name: test num_bytes: 7944880 num_examples: 15955 download_size: 16580777 dataset_size: 79637531 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* --- # Kvasir-VQA-x1 A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy [Kvasir-VQA-x1 on GitHub](https://github.com/simula/Kvasir-VQA-x1) | [Original Image from Kvasir-VQA(Simula Datasets)](https://datasets.simula.no/kvasir-vqa/) | [Paper](https://huggingface.co/papers/2506.09958) > 🔗 [MediaEval Medico 2025 Challenge](https://github.com/simula/MediaEval-Medico-2025) uses this dataset. We encourage you to check out and participate! ## Overview **Kvasir-VQA-x1** is a large-scale dataset designed to benchmark medical visual question answering (MedVQA) in gastrointestinal (GI) endoscopy. It introduces 159,549 new QA pairs stratified by clinical complexity, along with support for visual robustness testing via augmentations. ## Features Each dataset entry includes: - `img_id`: Unique reference to an image from Kvasir-VQA - `complexity`: Question complexity level (1–3) - `question`: Complex, natural language question - `answer`: Clinically validated answer - `original`: List of atomic QA pairs merged into the complex question - `question_class`: Associated clinical category labels ## Splits - `train`: Training samples only - `test`: Held-out samples for final evaluation To ensure generalization, no image or QA from the test set appears in the training set. --- ## Downloading Images and Preparing JSONLs ```python from datasets import load_dataset from pathlib import Path from tqdm import tqdm import pandas as pd import json import os # Folder to save images and JSONLs d_path = "./Kvasir-VQA-x1/" img_dir = Path(os.path.abspath(os.path.join(d_path, "images"))) img_dir.mkdir(exist_ok=True, parents=True) # Save images once from SimulaMet-HOST/Kvasir-VQA ds_host = load_dataset("SimulaMet-HOST/Kvasir-VQA", split="raw") seen = set() for row in tqdm(ds_host, desc="Saving images"): if row["img_id"] not in seen: row["image"].save(img_dir / f"{row['img_id']}.jpg") seen.add(row["img_id"]) # Save VLM-ready JSONLs for SimulaMet/Kvasir-VQA-x1 for split in ["train", "test"]: with open(f"{d_path}/Kvasir-VQA-x1-{split}.jsonl", "w", encoding="utf-8") as f: for r in load_dataset("SimulaMet/Kvasir-VQA-x1", split=split): f.write(json.dumps({ "messages": [ {"role": "user", "content": f"<image>{r['question']}"}, {"role": "assistant", "content": r["answer"]} ], "images": [str(img_dir / f"{r['img_id']}.jpg")] }, ensure_ascii=False) + "\n") ``` 🔗 Instructions and code for creating VLM-ready JSONLs that incorporates the augmented images can be found [here](https://github.com/simula/Kvasir-VQA-x1/blob/main/README.md#2%EF%B8%8F%E2%83%A3-generate-weakly-augmented-images). ## Use Cases - Multimodal clinical reasoning - Robustness evaluation under visual perturbations - Fine-tuning and benchmarking of VLMs (Vision-Language Models) ## License Released under **CC BY-NC 4.0** – for academic and non-commercial use. Please cite appropriately. ## Citation & Paper Please cite the associated dataset paper if you use Kvasir-VQA-x1 in your work: ```bibtex @article{Gautam2025Jun, author = {Gautam, Sushant and Riegler, Michael A. and Halvorsen, P{\aa}l}, title = {{Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy}}, journal = {arXiv}, year = {2025}, month = jun, eprint = {2506.09958}, doi = {10.48550/arXiv.2506.09958} } ``` ## Related Repositories You can find full training scripts, augmentation tools, and baseline models at: 👉 [GitHub: simula/Kvasir-VQA-x1](https://github.com/simula/Kvasir-VQA-x1)

提供机构：

okib

5,000+

优质数据集

54 个

任务类型

进入经典数据集