five

HPAI-BSC/CareQA-Vision

收藏
Hugging Face2025-10-14 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/HPAI-BSC/CareQA-Vision
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: CareQA_close data_files: CareQA-Vis_close.json description: "Closed-ended version of the CareQA-Vision dataset with multiple-choice questions." - config_name: CareQA_open data_files: CareQA-Vis_open.json description: "Open-ended version of the CareQA-Vision dataset with free-response questions." license: apache-2.0 task_categories: - question-answering language: - en tags: - nursing - medicine pretty_name: CareQA-Vision size_categories: - n<1K --- <div align="center"> <img src="https://cdn-uploads.huggingface.co/production/uploads/65d71603ca16ef9ba7fb2efb/pbR94FDHWM7gf7BcrWxcV.png" width="20%" alt="HPAI"/> </div> <hr style="margin: 15px"> <div align="center" style="line-height: 1;"> <a href="https://hpai.bsc.es/" target="_blank" style="margin: 1px;"> <img alt="Web" src="https://img.shields.io/badge/Website-HPAI-8A2BE2" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://huggingface.co/HPAI-BSC" target="_blank" style="margin: 1px;"> <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-HPAI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://github.com/HPAI-BSC" target="_blank" style="margin: 1px;"> <img alt="GitHub" src="https://img.shields.io/badge/GitHub-HPAI-%23121011.svg?logo=github&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> </div> <div align="center" style="line-height: 1;"> <a href="https://www.linkedin.com/company/hpai" target="_blank" style="margin: 1px;"> <img alt="Linkedin" src="https://img.shields.io/badge/Linkedin-HPAI-blue" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://bsky.app/profile/hpai.bsky.social" target="_blank" style="margin: 1px;"> <img alt="BlueSky" src="https://img.shields.io/badge/Bluesky-HPAI-0285FF?logo=bluesky&logoColor=fff" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://linktr.ee/hpai_bsc" target="_blank" style="margin: 1px;"> <img alt="LinkTree" src="https://img.shields.io/badge/Linktree-HPAI-43E55E?style=flat&logo=linktree&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> </div> <div align="center" style="line-height: 1;"> <!-- <a href="https://arxiv.org/abs/2502.06666" target="_blank" style="margin: 1px;"> <img alt="Arxiv" src="https://img.shields.io/badge/arXiv-2502.06666-b31b1b.svg" style="display: inline-block; vertical-align: middle;"/> </a> --> <a href="LICENSE" style="margin: 1px;"> <img alt="License" src="https://img.shields.io/badge/license-Apache%202.0-green" style="display: inline-block; vertical-align: middle;"/> </a> </div> # CareQA-Vision Dataset ## Table of Contents - [Dataset Summary](#dataset-summary) - [Versions](#versions) - [Dataset Statistics](#dataset-statistics) - [Data Fields](#data-fields) - [Dataset Instances](#dataset-instances) - [Closed-Ended Example](#closed-ended-example) - [Open-Ended Example](#open-ended-example) - [Dataset Creation](#dataset-creation) - [Intended Use](#intended-use) - [Limitations & Considerations](#limitations--considerations) - [Additional Information](#additional-information) --- ## Dataset Summary CareQA-Vision is a vision-based healthcare QA dataset derived from the Spanish Specialized Healthcare Training (FSE) exams. All questions are curated by medical experts and cover the specialties of nursing and medicine. This dataset extends the original [CareQA](https://huggingface.co/datasets/HPAI-BSC/CareQA) dataset by including image-based questions from exams conducted between 2020 and 2024. --- ## Versions - **Closed-ended:** Multiple-choice questions (MCQA) with four answer options. Originally in Spanish, later translated to English. - **Open-ended:** Free-response questions generated from the closed-ended version using Qwen2.5-72B-Instruct, followed by manual verification. Some questions could not be reliably converted to open-ended format, resulting in a smaller open-ended set. --- ## Dataset Statistics | Type | Nursing | Medicine | Total | |--------|--------|---------|-------| | Closed | 42 | 123 | 165 | | Open | 28 | 108 | 136 | --- ## Data Fields - `unique_id`: Unique identifier for each question - `category`: Nursing or medicine - `year`: Exam year (2020–2024) - `question_number`: Number of the question in the original exam - `question`: Text of the question (Spanish or English) - `options`: Multiple-choice options (only for closed-ended) - `answer`: Correct answer or option (open-ended) - `image`: Associated image --- ## Dataset Instances Below are example questions from the CareQA-Vision dataset. ### Closed-Ended Example ```json { "unique_id": "7088913624bdacbe9b8a3545a87c3210", "category": "MEDICINA", "year": 2020, "question_number": 20, "question": "What is the treatment of choice for a 96-year-old patient who presents with the fracture shown in the image?", "options": { "A": "Trochanteric nail on an orthopedic table.", "B": "Partial bipolar hip prosthesis.", "C": "Osteosynthesis with cannulated screws.", "D": "No surgery, early bed-chair life." }, "answer": "A", "image": "images/Cuaderno_2020_MEDICINA_I_IMAGEN_20.png" } ``` ### Closed-Ended Example ```json { "unique_id": "716a90d27a4331b564fa56aaa68e0a43", "category": "ENFERMERIA", "year": 2020, "question_number": 5, "question": "What type of tissue is shown in the image?", "answer": "Adipose tissue.", "image": "images/Cuaderno_2020_ENFERMERIA_I_IMAGEN_5.png" } ``` ## Intended Use CareQA-Vision is designed primarily for evaluating AI models on vision-based healthcare question-answering tasks. ## Limitations & Considerations Users should note that the dataset is relatively small, containing 165 closed-ended and 136 open-ended questions, and was primarily designed for evaluation purposes. Minor translation or rephrasing errors may still exist. Additionally, some images may include Spanish text; however, this text is not essential for answering the questions. ## Additional Information ### Dataset Curator Anna Arias-Duart ### Licensing Information The dataset is licensed under the Apache License 2.0. ### Citation Information

configs: - config_name: CareQA_close data_files: CareQA-Vis_close.json description: "CareQA-Vision数据集的封闭式版本,包含多项选择题。" - config_name: CareQA_open data_files: CareQA-Vis_open.json description: "CareQA-Vision数据集的开放式版本,包含自由作答类题目。" license: apache-2.0 task_categories: - 问答(Question Answering, QA) language: - en tags: - 护理学 - 医学 pretty_name: CareQA-Vision size_categories: - 样本量少于1000 <div align="center"> <img src="https://cdn-uploads.huggingface.co/production/uploads/65d71603ca16ef9ba7fb2efb/pbR94FDHWM7gf7BcrWxcV.png" width="20%" alt="HPAI"/> </div> <hr style="margin: 15px"> <div align="center" style="line-height: 1;"> <a href="https://hpai.bsc.es/" target="_blank" style="margin: 1px;"> <img alt="Web" src="https://img.shields.io/badge/Website-HPAI-8A2BE2" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://huggingface.co/HPAI-BSC" target="_blank" style="margin: 1px;"> <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-HPAI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://github.com/HPAI-BSC" target="_blank" style="margin: 1px;"> <img alt="GitHub" src="https://img.shields.io/badge/GitHub-HPAI-%23121011.svg?logo=github&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> </div> <div align="center" style="line-height: 1;"> <a href="https://www.linkedin.com/company/hpai" target="_blank" style="margin: 1px;"> <img alt="Linkedin" src="https://img.shields.io/badge/Linkedin-HPAI-blue" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://bsky.app/profile/hpai.bsky.social" target="_blank" style="margin: 1px;"> <img alt="BlueSky" src="https://img.shields.io/badge/Bluesky-HPAI-0285FF?logo=bluesky&logoColor=fff" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://linktr.ee/hpai_bsc" target="_blank" style="margin: 1px;"> <img alt="LinkTree" src="https://img.shields.io/badge/Linktree-HPAI-43E55E?style=flat&logo=linktree&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> </div> <div align="center" style="line-height: 1;"> <!-- <a href="https://arxiv.org/abs/2502.06666" target="_blank" style="margin: 1px;"> <img alt="Arxiv" src="https://img.shields.io/badge/arXiv-2502.06666-b31b1b.svg" style="display: inline-block; vertical-align: middle;"/> </a> --> <a href="LICENSE" style="margin: 1px;"> <img alt="License" src="https://img.shields.io/badge/license-Apache%202.0-green" style="display: inline-block; vertical-align: middle;"/> </a> </div> # CareQA-Vision 数据集 ## 目录 - [数据集概述](#数据集概述) - [数据集版本](#数据集版本) - [数据集统计](#数据集统计) - [数据字段](#数据字段) - [数据集示例](#数据集示例) - [封闭式题目示例](#封闭式题目示例) - [开放式题目示例](#开放式题目示例) - [数据集构建](#数据集构建) - [预期用途](#预期用途) - [局限性与注意事项](#局限性与注意事项) - [附加信息](#附加信息) --- ## 数据集概述 CareQA-Vision是一款基于视觉的医疗问答(Question Answering, QA)数据集,源自西班牙专科医疗培训(Formación Sanitaria Especializada, FSE)考试。所有题目均由医学专家遴选审定,涵盖护理学与医学两大专业领域。本数据集基于原始[CareQA](https://huggingface.co/datasets/HPAI-BSC/CareQA)数据集拓展而来,新增了2020年至2024年间的考试图像类题目。 --- ## 数据集版本 - **封闭式题目**:包含四个选项的多项选择题(Multiple Choice Question Answering, MCQA)。原始版本为西班牙语,后被翻译为英语。 - **开放式题目**:通过Qwen2.5-72B-Instruct将封闭式题目转换为自由作答题型,随后经人工审核验证。部分题目无法可靠转换为开放式格式,因此开放式题目集规模较小。 --- ## 数据集统计 | 题型 | 护理学 | 医学 | 总计 | |--------|--------|------|------| | 封闭式 | 42 | 123 | 165 | | 开放式 | 28 | 108 | 136 | --- ## 数据字段 - `unique_id`:每道题的唯一标识符 - `category`:所属领域(护理学或医学) - `year`:考试年份(2020–2024) - `question_number`:原考试中的题目编号 - `question`:题目文本(语言可为西班牙语或英语) - `options`:多项选择题选项(仅封闭式题目包含此字段) - `answer`:正确答案或选项(开放式题目此字段为作答内容) - `image`:关联的图像文件 --- ## 数据集示例 以下为CareQA-Vision数据集的题目示例。 ### 封闭式题目示例 json { "unique_id": "7088913624bdacbe9b8a3545a87c3210", "category": "MEDICINA", "year": 2020, "question_number": 20, "question": "针对一名出现图中所示骨折的96岁患者,首选治疗方案是什么?", "options": { "A": "骨科手术床转子髓内钉固定术。", "B": "部分双极髋关节置换术。", "C": "空心螺钉骨缝合术。", "D": "无需手术,尽早下床活动。" }, "answer": "A", "image": "images/Cuaderno_2020_MEDICINA_I_IMAGEN_20.png" } ### 开放式题目示例 json { "unique_id": "716a90d27a4331b564fa56aaa68e0a43", "category": "ENFERMERIA", "year": 2020, "question_number": 5, "question": "图中展示的是哪种组织?", "answer": "脂肪组织。", "image": "images/Cuaderno_2020_ENFERMERIA_I_IMAGEN_5.png" } ## 预期用途 CareQA-Vision主要用于评估基于视觉的医疗问答任务的人工智能模型。 ## 局限性与注意事项 用户需注意,本数据集规模相对较小,仅包含165道封闭式题目与136道开放式题目,且最初专为评估任务设计。仍可能存在少量翻译或措辞修正错误。此外,部分图像中包含西班牙语文本,但该文本并非解答题目所必需。 ## 附加信息 ### 数据集编纂者 Anna Arias-Duart ### 许可信息 本数据集采用Apache License 2.0许可证。 ### 引用信息
提供机构:
HPAI-BSC
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作