A Generative Video Dataset to Represent Everyday Human Activity
收藏DataCite Commons2026-02-08 更新2026-04-25 收录
下载链接:
https://figshare.com/articles/dataset/Generative_Human_Activity_Dataset/30229711
下载链接
链接失效反馈官方服务:
资源简介:
Human activity datasets underpin much of ubiquitous computing and HCI research, but traditional methods of capturing video or sensor data are costly, time-consuming, and raise privacy concerns. Advances in generative AI offer a new path: instead of recording people, we can synthesise convincing audiovisual depictions of everyday activities that are ethically shareable and scalable.We introduce the Generative Human Activity Dataset, comprising:1,000 AI-generated short videos of diverse human activities4,000 paired audio clips of ambient sounds and effectsStructured metadata and text prompts for each clipPre-computed semantic embeddings for efficient searchAlongside the dataset, we provide a set of open-source tools:A semantic search engine supporting natural language queriesA lightweight web API and embeddable widget for integration into surveys and applicationsProgrammatic interfaces for research and prototypingThis dataset and toolkit make it possible to:Use synthetic clips as stimuli for experimentsSupport design prototyping and scenario explorationProvide training data for models without expensive annotation and minimal privacy risksBy combining generative media with accessible tools, our contribution offers an ethically shareable, scalable alternative to traditional human activity datasets, positioning synthetic datasets as practical research resources in their own right.
人类活动数据集是普适计算(Ubiquitous Computing)与人机交互(Human-Computer Interaction,简称HCI)领域诸多研究的核心支撑,但传统的视频或传感器数据采集方法不仅成本高昂、耗时冗长,还会引发隐私安全顾虑。生成式人工智能(Generative AI)的技术进展开辟了全新路径:无需真实录制人类受试者,我们即可合成具有真实感的日常活动视听场景,这类数据可在伦理层面安全共享且具备良好的可扩展性。本研究推出的生成式人类活动数据集(Generative Human Activity Dataset)包含以下组成部分:1000段由人工智能生成的多样化人类活动短视频;4000条与上述短视频匹配的环境音与音效音频片段;适配每一段素材的结构化元数据与文本提示词;以及用于高效检索的预计算语义嵌入(Semantic Embedding)。伴随该数据集一同发布的还有一套开源工具套件:支持自然语言查询的语义搜索引擎;轻量级Web应用程序接口(Web API)与可嵌入组件,可集成至调研问卷与各类应用程序中;以及用于研究与原型开发的编程接口。本数据集与工具套件可实现以下应用场景:将合成素材用作实验刺激材料;支持设计原型开发与场景探索;为模型训练提供训练数据,无需高额标注成本且隐私风险极低。本研究将生成式媒体与易用工具相结合,为传统人类活动数据集提供了一种可在伦理层面安全共享且具备可扩展性的替代方案,同时确立了合成数据集作为独立实用研究资源的地位。
提供机构:
figshare
创建时间:
2025-09-29



