xandery/SuperGlasses

Name: xandery/SuperGlasses
Creator: xandery
Published: 2026-04-11 07:34:45
License: 暂无描述

Hugging Face2026-04-11 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/xandery/SuperGlasses

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - image-text-to-text configs: - config_name: images data_files: - split: test path: images/test-* - config_name: queries data_files: - split: test path: queries/test-* dataset_info: - config_name: images features: - name: image_id dtype: int64 - name: image_name dtype: string - name: image dtype: image splits: - name: test num_bytes: 8773194740 num_examples: 2394 download_size: 8767634230 dataset_size: 8773194740 - config_name: queries features: - name: id dtype: int64 - name: question dtype: string - name: answer dtype: string - name: image dtype: string - name: image_id dtype: int64 - name: glasses dtype: string - name: domain dtype: string - name: image_quality dtype: string - name: category list: string - name: language list: string - name: dynamism dtype: string - name: difficulty dtype: string - name: location dtype: string - name: use_image dtype: string - name: hops dtype: int64 - name: sub_questions list: - name: image-recognition dtype: string - name: retrieval-tools dtype: string - name: sub-question dtype: string - name: souce_id dtype: int64 splits: - name: test num_bytes: 1997607 num_examples: 2422 download_size: 898071 dataset_size: 1997607 --- # SUPERGLASSES This repository contains the dataset for the paper [SUPERGLASSES: Benchmarking Vision Language Models as Intelligent Agents for AI Smart Glasses](https://huggingface.co/papers/2602.22683). SUPERGLASSES is the first comprehensive Visual Question Answering (VQA) benchmark built on real-world data entirely collected by smart glasses devices. It comprises 2,422 egocentric image-question pairs spanning 14 image domains and 8 query categories, enriched with full search trajectories and reasoning annotations. The benchmark is specifically designed to evaluate Vision Language Models (VLMs) in realistic smart glasses usage scenarios, where identifying an object of interest is a critical prerequisite for external knowledge retrieval. ### Dataset Structure The dataset is provided in two main configurations: - `images`: Contains the egocentric images captured by smart glasses. - `queries`: Contains questions, answers, and detailed annotations including difficulty, location, and reasoning sub-questions.

提供机构：

xandery

搜集汇总

数据集介绍

构建方式

在智能眼镜应用场景的驱动下，SUPERGLASSES数据集通过真实世界的第一视角采集方式构建而成。研究团队利用智能眼镜设备直接捕获了2,394张具身化图像，并围绕这些图像精心设计了2,422个视觉问答对。每个问答对均配备了完整的搜索轨迹与多层级推理标注，覆盖了14个图像领域与8种查询类别，确保了数据在真实环境中的代表性与复杂性。

使用方法

研究者可通过加载`images`与`queries`两个配置来使用该数据集。`images`配置包含智能眼镜拍摄的原始图像，而`queries`配置则提供了问题、答案及全面的标注信息。该数据集适用于评测视觉语言模型在具身化环境下的视觉理解、多步推理及知识检索性能，尤其适合用于开发或评估面向智能眼镜等可穿戴设备的交互式人工智能系统。

背景与挑战

背景概述

在可穿戴智能设备迅猛发展的时代背景下，智能眼镜作为增强现实与即时信息交互的关键载体，对人工智能模型的场景理解能力提出了更高要求。SuperGlasses数据集应运而生，其基于论文《SUPERGLASSES: Benchmarking Vision Language Models as Intelligent Agents for AI Smart Glasses》构建，是首个完全由智能眼镜设备采集的真实世界数据构成的视觉问答基准。该数据集由研究团队于2024年创建，核心旨在评估视觉语言模型在智能眼镜实际应用场景中的性能，特别是模型在识别兴趣对象后，进行外部知识检索的复合能力。它涵盖了14个图像领域和8种查询类别，包含2422个以自我为中心的图像-问题对，并附有完整的搜索轨迹与推理标注，为推进具身智能与情境感知计算的研究提供了重要数据基础。

当前挑战

SuperGlasses数据集致力于解决智能眼镜场景下视觉语言模型作为智能代理的复杂评估问题，其核心挑战在于模拟真实世界中以第一人称视角进行多步骤推理与知识检索的任务。具体而言，数据集构建需克服从动态、非结构化的日常环境中采集高质量自我中心图像的困难，确保图像涵盖多样化的光照条件、遮挡场景与动作模糊。同时，设计兼具认知深度与领域广度的自然语言查询，并精确标注多跳推理的子问题与工具使用轨迹，构成了另一重挑战。这些挑战共同指向了如何使视觉语言模型在开放世界中，像人类佩戴者一样，通过视觉感知主动发起有效信息查询这一前沿难题。

常用场景

经典使用场景

在智能眼镜与增强现实领域，SUPERGLASSES数据集为视觉语言模型的评估提供了首个基于真实世界数据的基准。其经典使用场景集中于模拟智能眼镜用户的日常交互，通过以自我为中心的图像与自然语言问题配对，要求模型在复杂环境中识别视觉对象并完成多步骤推理。这一场景直接反映了智能眼镜作为可穿戴设备的核心功能，即理解佩戴者视野中的内容并响应其查询，从而为模型在动态、非结构化环境中的感知与认知能力设定了严格的测试标准。

解决学术问题

SUPERGLASSES数据集主要解决了视觉语言模型在现实世界应用中的评估缺失问题。传统视觉问答基准多依赖于静态网络图像，缺乏智能眼镜采集的自我中心视角数据，难以衡量模型在动态、连续视觉流中的表现。该数据集通过提供涵盖14个图像领域和8种查询类别的丰富标注，包括搜索轨迹与推理子问题，使研究者能够系统评估模型在对象识别、外部知识检索及多跳推理等关键任务上的性能，推动了具身智能与可穿戴计算交叉领域的算法进步。

实际应用

该数据集的实际应用场景紧密关联于智能眼镜产品的功能开发与优化。例如，在辅助导航、物体寻找或实时信息增强等场景中，智能眼镜需准确理解用户视野并回答诸如“我面前的建筑是什么风格？”或“这个零件的型号在哪里可以买到？”等问题。SUPERGLASSES提供的真实数据能够帮助工程师训练与测试系统，确保其在多变光照、遮挡及运动模糊条件下仍能可靠工作，从而提升用户体验，加速消费级与工业级智能眼镜的落地进程。

数据集最近研究