sigmacollab

Name: sigmacollab
Creator: maas
Published: 2025-12-05 12:12:29
License: 暂无描述

魔搭社区2025-12-05 更新2025-12-06 收录

下载链接：

https://modelscope.cn/datasets/microsoft/sigmacollab

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for SigmaCollab  __`SigmaCollab`__ is a dataset that enables research on human-AI physically situated collaboration. The dataset consists of a set of 85 sessions in which untrained participants were guided by a mixed-reality assistive AI agent in performing procedural tasks in the physical world. In addition 8 sessions are available in which an expert performed the same tasks. ## Dataset Details ### Dataset Description  __`SigmaCollab`__, described in detail in [this arxiv paper](https://arxiv.org/abs/2511.02560), was collected with an open-source mixed-reality AI application called [Sigma](https://github.com/microsoft/psi/blob/master/Applications/Sigma/Readme.md) (itself described in [this arxiv paper](https://arxiv.org/abs/2405.13035) and in an [IEEE VR extended abstract](https://ieeexplore.ieee.org/abstract/document/10536320)). The _application-driven_ and _interactive_ nature of the __`SigmaCollab`__ dataset bring to the fore novel research challenges for human-AI collaboration, and provide more realistic testing grounds for various AI models operating in this space. Additionally, the [Sigma](https://github.com/microsoft/psi/blob/master/Applications/Sigma/Readme.md) system is open-source, and you can download and run it yourself to collect your own / additional data. __`SigmaCollab`__ includes a set of rich, multimodal data streams: the participant and system audio, egocentric camera views from the head-mounted device, depth maps, head, hand and gaze tracking information, as well as additional annotations performed post-hoc. The raw set of data streams included is summarized in the table below: | Stream | Representation | Avg. Frame Rate | |----------------------------------|------------------------------------------------------------------------------------------------------------------|:---------------------:| | RGB Camera View | 896 × 504 pixels @ 24bpp, with camera pose and intrinsics | 14.91 Hz | | Depth Camera View | 320 × 288 pixels @ 16bpp, with camera pose and intrinsics | 4.98 Hz | | Left Front Grayscale Camera View | 640 × 480 pixels @ 8bpp, with camera pose and intrinsics | 13.64 Hz | | Right Front Grayscale Camera View| 640 × 480 pixels @ 8bpp, with camera pose and intrinsics | 13.64 Hz | | Head Pose + Eye Gaze | tuple of head pose matrix (4 × 4) and eye gaze ray (3 × 1 origin position vector and 3 × 1 direction vector) | 28.37 Hz | | Hands Pose | pose matrices (4 × 4) for each of the 26 joints in the left and right hand | 20.01 Hz | | Audio | 1-channel, 32-bit floating-point PCM | 16.00 kHz | Additionally, __`SigmaCollab`__ includes manual segmentation and transcripts for the user utterances, word-level timings for both user and system utterances, task success annotations, as well as post-processed gaze information (e.g., projection of gaze point in the various camera images). For more details regarding the dataset contents, structure, and data formats, please see the [Dataset Structure](https://github.com/microsoft/SigmaCollab/blob/master/DatasetStructure.md) documentation page. - **Curated by:** Microsoft Research - **Language(s) (NLP):** English - **License:** [CDLA-Permissive-2.0](https://github.com/microsoft/SigmaCollab/blob/master/CDLA.md) ### Dataset Sources - **Repository:** [https://github.com/microsoft/SigmaCollab](https://github.com/microsoft/SigmaCollab) - **Paper:** [SigmaCollab: An Application-Driven Dataset for Physically Situated Collaboration](https://arxiv.org/abs/2511.02560) ## Uses  __`SigmaCollab`__ aims to catalyze rigorous, application-grounded study of fluid human–AI collaboration and to close the gap between lab benchmarks and real-world performance. The dataset can serve as a testbed for evaluating the generalizability and effectiveness of various models in the context of an interactive application, which is potentially out-of-distribution from what the models have been trained on. ## Dataset Structure  Details regarding the dataset contents, structure, and data formats are available in this [documentation page](https://github.com/microsoft/SigmaCollab/blob/master/DatasetStructure.md). ## Dataset Creation ### Curation Rationale  While significant progress has been made over the past decade towards building computing systems that interact with people in the physical world, a lot of the existing datasets focus on computer vision or language processing challenges. However, in addition to understanding the environment, objects, and actions, creating seamless situated collaborations raises additional _interaction_- and _collaboration_-related challenges. In these areas, progress has been slower. __`SigmaCollab`__ was developed to foster and enable more research on such challenges. In future work, we plan to construct and publish a set of benchmarks in this space based on this dataset. ### Source Data  __`SigmaCollab`__ was constructed via a data collection experiment in which participants interacted with the open-source [Sigma](https://github.com/microsoft/psi/blob/master/Applications/Sigma/Readme.md) system to perform a set of procedural tasks. The data experiment was conducted in a lab at Microsoft Research and was approved by the Microsoft Research Institutional Review Board. #### Data Collection and Processing  The dataset was collected by having untrained participants interact with a mixed-reality assistive AI application ([Sigma](https://github.com/microsoft/psi/blob/master/Applications/Sigma/Readme.md)) which guided them in real-time in performing certain tasks in the physical world, such as binding a notebook or installing the wheels on a skateboard. #### Who are the source data producers?  We recruited participants for the study from among the co-workers at our organization, via broadly reaching emails and word-of-mouth. In total 21 participants (12 male and 9 female) engaged in the data collection study and provided permission for public data release. Most participants were in the 46-55 age bracket. While their level of familiarity with AR/VR technologies varied, most participants had already encountered these technologies, even if they were not using them often. ### Annotations  __`SigmaCollab`__ includes manual segmentation and transcripts for the user utterances, word-level timings for both user and system utterances, task success annotations, as well as post-processed gaze information (e.g., projection of gaze point in the various camera images). For more details regarding the dataset contents, structure, and data formats, please see the [Dataset Structure](https://github.com/microsoft/SigmaCollab/blob/master/DatasetStructure.md) documentation page. For additional details regarding the annotations, see the [dataset paper](https://arxiv.org/abs/2511.02560).       ## Bias, Risks, and Limitations  __`SigmaCollab`__ has certain biases: the data was collected in a laboratory setting and may not reflect the full complexities arising in a real-world deployment. While there is a certain amount of variety in the procedural tasks involved, the tasks do not reflect the whole range of issues that may arise during procedural task guidance. The participants in the data collection experiment were selected from among employees at Microsoft Research. For more details about the data collection process, please see the [dataset paper](https://arxiv.org/abs/2511.02560).    ## Citation  **BibTeX:** ``` @misc{bohus2025sigmacollabapplicationdrivendatasetphysically, title={SigmaCollab: An Application-Driven Dataset for Physically Situated Collaboration}, author={Dan Bohus and Sean Andrist and Ann Paradiso and Nick Saw and Tim Schoonbeek and Maia Stiber}, year={2025}, eprint={2511.02560}, archivePrefix={arXiv}, primaryClass={cs.HC}, url={https://arxiv.org/abs/2511.02560}, } ``` ## Dataset Card Contact dbohus@microsoft.com sandrist@microsoft.com maiastiber@microsoft.com

# SigmaCollab 数据集卡片  **SigmaCollab** 是一款支持物理情境下人机协作研究的数据集。该数据集包含85段会话数据，在这些会话中，未经专业训练的参与者在混合现实辅助AI智能体（AI Agent）的引导下，于现实物理世界中完成一系列程序性任务。此外，另有8段由专家完成相同任务的会话数据可供使用。 ## Dataset Details ### Dataset Description  **SigmaCollab** 的详细介绍可参见[该arXiv论文](https://arxiv.org/abs/2511.02560)，其数据采集基于一款名为**Sigma**的开源混合现实AI应用（该应用本身的详细说明可参见[另一篇arXiv论文](https://arxiv.org/abs/2405.13035)以及[一篇IEEE VR扩展摘要](https://ieeexplore.ieee.org/abstract/document/10536320)）。**SigmaCollab** 数据集的应用驱动与交互特性，为人机协作领域带来了全新的研究挑战，同时也为该领域的各类AI模型提供了更贴合现实场景的测试平台。此外，**Sigma** 系统本身为开源项目，用户可自行下载并运行，以采集自定义或补充的数据。 **SigmaCollab** 包含多组丰富的多模态数据流：参与者与系统的音频、头戴式设备采集的第一人称视角画面、深度图、头部、手部与视线追踪信息，以及额外的事后标注数据。本次收录的原始数据流详情汇总如下表： | 数据流名称 | 表征形式 | 平均帧率 | |----------------------------------|------------------------------------------------------------------------------------------------------------------|:---------------------:| | RGB摄像头画面 | 896×504像素，24位色深，附带摄像头位姿与内参 | 14.91 Hz | | 深度摄像头画面 | 320×288像素，16位色深，附带摄像头位姿与内参 | 4.98 Hz | | 左前灰度摄像头画面 | 640×480像素，8位色深，附带摄像头位姿与内参 | 13.64 Hz | | 右前灰度摄像头画面| 640×480像素，8位色深，附带摄像头位姿与内参 | 13.64 Hz | | 头部姿态+视线轨迹 | 头部姿态矩阵（4×4）与视线射线（3×1原点位置向量+3×1方向向量）组成的元组 | 28.37 Hz | | 手部姿态 | 左右两手各26个关节的姿态矩阵（4×4） | 20.01 Hz | | 音频 | 单通道32位浮点型PCM音频 | 16.00 kHz | 此外，**SigmaCollab** 还包含用户话语的人工分段与转录文本、用户与系统话语的词级时间戳、任务完成情况标注，以及后处理后的视线信息（例如视线点在各类摄像头画面中的投影坐标）。如需了解数据集内容、结构与数据格式的更多细节，请参见[数据集结构说明文档](https://github.com/microsoft/SigmaCollab/blob/master/DatasetStructure.md)。 - **Curated by:** Microsoft Research - **Language(s) (NLP):** English - **License:** [CDLA-Permissive-2.0](https://github.com/microsoft/SigmaCollab/blob/master/CDLA.md) ### Dataset Sources - **Repository:** [https://github.com/microsoft/SigmaCollab](https://github.com/microsoft/SigmaCollab) - **Paper:** [SigmaCollab: An Application-Driven Dataset for Physically Situated Collaboration](https://arxiv.org/abs/2511.02560) ## Uses  **SigmaCollab** 旨在推动针对流畅人机协作的严谨应用导向研究，填补实验室基准测试与现实场景性能之间的差距。该数据集可作为测试平台，用于评估各类模型在交互式应用场景下的泛化能力与实际效果——这类场景往往与模型的训练分布存在分布偏移。 ## Dataset Structure  Details regarding the dataset contents, structure, and data formats are available in this [documentation page](https://github.com/microsoft/SigmaCollab/blob/master/DatasetStructure.md). ## Dataset Creation ### Curation Rationale  While significant progress has been made over the past decade towards building computing systems that interact with people in the physical world, a lot of the existing datasets focus on computer vision or language processing challenges. However, in addition to understanding the environment, objects, and actions, creating seamless situated collaborations raises additional _interaction_- and _collaboration_-related challenges. In these areas, progress has been slower. **SigmaCollab** was developed to foster and enable more research on such challenges. In future work, we plan to construct and publish a set of benchmarks in this space based on this dataset. ### Source Data  **SigmaCollab** was constructed via a data collection experiment in which participants interacted with the open-source [Sigma](https://github.com/microsoft/psi/blob/master/Applications/Sigma/Readme.md) system to perform a set of procedural tasks. The data experiment was conducted in a lab at Microsoft Research and was approved by the Microsoft Research Institutional Review Board. #### Data Collection and Processing  The dataset was collected by having untrained participants interact with a mixed-reality assistive AI application ([Sigma](https://github.com/microsoft/psi/blob/master/Applications/Sigma/Readme.md)) which guided them in real-time in performing certain tasks in the physical world, such as binding a notebook or installing the wheels on a skateboard. #### Who are the source data producers?  We recruited participants for the study from among the co-workers at our organization, via broadly reaching emails and word-of-mouth. In total 21 participants (12 male and 9 female) engaged in the data collection study and provided permission for public data release. Most participants were in the 46-55 age bracket. While their level of familiarity with AR/VR technologies varied, most participants had already encountered these technologies, even if they were not using them often. ### Annotations  **SigmaCollab** includes manual segmentation and transcripts for the user utterances, word-level timings for both user and system utterances, task success annotations, as well as post-processed gaze information (e.g., projection of gaze point in the various camera images). For more details regarding the dataset contents, structure, and data formats, please see the [Dataset Structure](https://github.com/microsoft/SigmaCollab/blob/master/DatasetStructure.md) documentation page. For additional details regarding the annotations, see the [dataset paper](https://arxiv.org/abs/2511.02560).       ## Bias, Risks, and Limitations  **SigmaCollab** has certain biases: the data was collected in a laboratory setting and may not reflect the full complexities arising in a real-world deployment. While there is a certain amount of variety in the procedural tasks involved, the tasks do not reflect the whole range of issues that may arise during procedural task guidance. The participants in the data collection experiment were selected from among employees at Microsoft Research. For more details about the data collection process, please see the [dataset paper](https://arxiv.org/abs/2511.02560).    ## Citation  **BibTeX:** @misc{bohus2025sigmacollabapplicationdrivendatasetphysically, title={SigmaCollab: An Application-Driven Dataset for Physically Situated Collaboration}, author={Dan Bohus and Sean Andrist and Ann Paradiso and Nick Saw and Tim Schoonbeek and Maia Stiber}, year={2025}, eprint={2511.02560}, archivePrefix={arXiv}, primaryClass={cs.HC}, url={https://arxiv.org/abs/2511.02560}, } ## Dataset Card Contact dbohus@microsoft.com sandrist@microsoft.com maiastiber@microsoft.com

提供机构：

maas

创建时间：

2025-11-05

5,000+

优质数据集

54 个

任务类型

进入经典数据集