five

Voxel51/MashUpVQA

收藏
Hugging Face2024-05-10 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/Voxel51/MashUpVQA
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: [] language: en size_categories: - 10K<n<100K task_categories: [] task_ids: [] pretty_name: MashUpVQA tags: - fiftyone - image - vqa description: A mashup and remix of several visual question answering datasets, perfect for vibe checking your VLM. name: MashUpVQA format: FiftyOneDataset dataset_summary: ' This is a [FiftyOne](https://github.com/voxel51/fiftyone) dataset with 12780 samples. ## Installation If you haven''t already, install FiftyOne: ```bash pip install -U fiftyone ``` ## Usage ```python import fiftyone as fo import fiftyone.utils.huggingface as fouh # Load the dataset # Note: other available arguments include ''max_samples'', etc dataset = fouh.load_from_hub("Voxel51/MashUpVQA") # Launch the App session = fo.launch_app(dataset) ``` ' --- # Dataset Card for MashUpVQA ![image/png](dataset_preview.gif) This is a [FiftyOne](https://github.com/voxel51/fiftyone) dataset with 12780 samples. MashUpVQA is a remix of several visual question answering dataets. Our hope is that a dataset with a consistent format and lots of variety will make it easier the assess the performance of a VQA system. ## Installation If you haven't already, install FiftyOne: ```bash pip install -U fiftyone ``` ## Usage ```python import fiftyone as fo import fiftyone.utils.huggingface as fouh # Load the dataset # Note: other available arguments include 'max_samples', etc dataset = fouh.load_from_hub("Voxel51/MashUpVQA") # Launch the App session = fo.launch_app(dataset) ``` ## Dataset Details MashUpVQA was curated by - **Curated by:** [Harpreet Sahota, Hacker-in-Residence](https://huggingface.co/harpreetsahota) at [Voxel 51](https://huggingface.co/Voxel51) - **Language(s) (NLP):** en - **License:** MashUpVQA is a composite dataset created by combining multiple individual datasets. Each of these datasets may be subject to its own terms of use and licensing. The licensing terms of depend on the licensing terms of each individual dataset included in this compilation. As we have integrated data from various sources, we do not hold copyright over the data and acknowledge that each source retains rights over their respective data. Users of MashUpVQA are responsible for ensuring that their use of the data complies with the legal and licensing requirements of each individual dataset included. **Please ensure that you review and adhere to the licensing requirements of each individual dataset prior to using this data.** ## Dataset Structure Each sample in the dataset comprises: - An image - A question to be asked of the image - An answer ### Dataset Sources #### Code for creating the dataset can be found in this [notebook](https://colab.research.google.com/drive/1jexIg5-o4fPJsseuYQoPLpWaeWWnItpy?usp=sharing). The MashupVQA dataset is a composite dataset designed for vibe-checking and evaluating Visual Question Answering (VQA) systems, where models attempt to answer questions based on visual input. This dataset integrates multiple diverse datasets to cover a wide range of challenges in VQA, promoting robustness and versatility in developed models. Here's a summary of the constituent datasets: 1. **TextVQA**: Focuses on answering questions that require reading text within images, sourced from Open Images. The questions necessitate models to not only detect and read text but also reason about its relevance to the query. [TextVQA on LMMs Lab](https://huggingface.co/datasets/lmms-lab/textvqa). 2. **WildVision**: Contains a collection of public benchmarks for evaluating multimodal large language models, useful for general multimodal understanding tasks. [WildVision Dataset](https://huggingface.co/datasets/WildVision/PublicBenchHub/tree/main). 3. **RealWorldQA**: Tests models on real-world visuals like vehicle camera images, focusing on practical, verifiable question-answer pairs. [RealWorldQA Dataset](https://huggingface.co/datasets/xai-org/RealworldQA). 4. **AI2 Diagrams (AI2D)**: Offers a challenge in understanding scientific diagrams, with over 5,000 annotated diagrams from grade school textbooks. [AI2D on LMMs Lab](https://huggingface.co/datasets/lmms-lab/ai2d). 5. **DocVQA**: Focuses on document images spanning a century, with questions about their content, challenging models to handle various types of printed and handwritten text. [DocVQA on LMMs Lab](https://huggingface.co/datasets/lmms-lab/DocVQA). 6. **InfographicVQA**: Involves answering questions from infographic images, requiring reasoning over text, layout, and graphical elements. [InfographicVQA on LMMs Lab](https://huggingface.co/datasets/lmms-lab/DocVQA). 7. **MME**: A benchmark for evaluating multimodal large language models across diverse tasks like OCR, commonsense reasoning, and numerical calculations. [MME on LMMs Lab](https://huggingface.co/datasets/lmms-lab/MME). 8. **VisualWebBench**: Tests understanding of web page content across multiple levels, from whole page comprehension to specific element interactions. [VisualWebBench Repo](https://github.com/VisualWebBench/VisualWebBench). 9. **OCR-VQA**: Dedicated to answering questions based on text identified in images, specifically book covers. [OCR-VQA on Hugging Face](https://huggingface.co/datasets/howard-hou/OCR-VQA). 10. **Localized Narratives**: Provides rich annotations linking spoken descriptions to visual content through mouse traces, enhancing models' ability to connect visual and textual information. [Localized Narratives on Hugging Face](https://huggingface.co/datasets/vikhyatk/lnqa). 11. **VQA-RAD**: Specializes in medical VQA with radiology images, where questions and answers are generated by clinicians, focusing on medically relevant visual content. [VQA-RAD on Hugging Face](https://huggingface.co/datasets/flaviagiammarino/vqa-rad). #### Data Collection and Processing This [notebook](https://colab.research.google.com/drive/1jexIg5-o4fPJsseuYQoPLpWaeWWnItpy?usp=sharing) demonstrates the process of creating a mashup dataset called "MashUpVQA" by combining and preprocessing three datasets: TextVQA, WildVision, and VQAv2. The goal is to create a consistent and consolidated dataset for multimodal question-answering tasks. ### Dataset Loading and Preprocessing 1. Each dataset is loaded from the Hugging Face hub using the `load_from_hub` function of `fiftyone`. 2. Smaller subsets of the datasets are created using the `take` and `clone` methods to reduce the dataset size for easier processing. 3. The datasets undergo a common preprocessing pipeline: 4. - A "source_dataset" field is added to indicate the source Hugging Face repo. - Unused fields are deleted based on the dataset configuration. - Fields are renamed for consistency across datasets (if needed). ### Answer Consolidation 1. A new "answer" field is added to each dataset using `add_sample_field` method of the `fo.dataset` object. 2. The `parse_answer` function is applied to each sample's "question" and "answers" fields to consolidate the answers into a single, most plausible answer. 3. The parsed answers are set as the values of the "answer" field using `set_values`. 4. The original "answers" field is deleted from each dataset. The preprocessed datasets are concatenated into a single dataset named and exported to the Hub in the FiftyOne dataset format. ## Dataset Card Authors [Harpreet Sahota](https://huggingface.co/harpreetsahota)
提供机构:
Voxel51
原始信息汇总

数据集概述

名称: MashUpVQA

描述: MashUpVQA是一个视觉问答(VQA)数据集,由多个数据集混合和重构而成,旨在评估和验证VQA系统的性能。

格式: FiftyOneDataset

样本数量: 12780

数据集内容

每个样本包含:

  • 图像
  • 针对图像的问题
  • 答案

数据集来源

构成数据集的子数据集包括:

  1. TextVQA: 专注于图像中文字的识别与理解。
  2. WildVision: 用于评估多模态大型语言模型的公共基准。
  3. RealWorldQA: 基于真实世界视觉场景的问答数据集。
  4. AI2 Diagrams (AI2D): 科学图表理解挑战。
  5. DocVQA: 跨世纪文档图像的问答。
  6. InfographicVQA: 基于信息图的问答。
  7. MME: 多模态大型语言模型评估基准。
  8. VisualWebBench: 网页内容多层次理解测试。
  9. OCR-VQA: 基于图像中识别文字的问答。
  10. Localized Narratives: 通过鼠标轨迹链接口语描述与视觉内容。
  11. VQA-RAD: 专注于医学放射图像的问答。

数据集创建与处理

创建方法: 通过合并和预处理多个数据集,使用特定的代码和方法确保数据的一致性和可用性。

处理步骤:

  • 加载数据集
  • 创建小规模子集
  • 应用统一的预处理流程
  • 答案的整合与标准化

数据集许可证

许可证说明: 由于数据集由多个来源组成,每个来源可能有其独立的许可条款。用户需确保使用数据集时遵守各来源的许可要求。

数据集作者

作者: Harpreet Sahota

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作