室内拍摄高度随机变化合成数据

Name: 室内拍摄高度随机变化合成数据
Creator: 杭州群核信息技术有限公司
Published: 2026-01-12 17:48:25
License: 暂无描述

浙江省数据知识产权登记平台2026-01-12 更新2026-01-13 收录

下载链接：

https://www.zjip.org.cn/home/announce/trends/8423279

下载链接

链接失效反馈

官方服务：

资源简介：

本数据集旨在解决计算机视觉模型在多变视角下的泛化能力问题，通过模拟一个在室内环境中自由漫游的智能体（如家庭服务机器人、无人机或AR/VR设备）的视角，为视角无关的视觉识别与三维场景理解提供关键数据。数据集构建于一个细节丰富的室内家装场景中。其核心特征是随机化的观测视角：相机高度在700mm至1500mm之间随机分布（模拟从坐姿到站立的观察高度），同时相机俯仰角在-45度到45度内随机变化（模拟从仰望天花板到俯视地面的观察角度）。这种设计打破了固定轨迹或固定视角的约束，生成了极其多样化的画面构图，能够捕捉到诸如桌面顶部、橱柜内部、吊灯底部等在固定低视角数据中无法看到的区域，从而构建了一个对真实世界视觉变化更具包容性的数据集。我们选取了一个室内的家装场景，模拟一个随机高度和俯仰角的漫游渲染，渲染的参数设置为，分辨率1920*1080，fov60度,相机高度在700mm-1500mm内随机，俯仰角在-45度-45度内随机。数据集内包含以下类型的内容：相机位姿（内外参），深度图，coco格式2d图片标注信息，相机坐标系下的法向图，渲染图，语义图，albedo通道图。本算法旨在处理三维模型，通过一系列步骤实现模型的分割、实例重组及格式转换，以生成新的实例模型，用于场景渲染和机器人训练等应用。 1.模型分割：本步骤接收任意初始三维模型作为输入，三维模型包括位置、尺寸、材质、顶点信息、法相信息、面片信息字段，运用拓扑连通性聚类算法将该组合模型拆分为多个面片组（face group），获取模型类型字段。此步骤有效提取模型的结构特征，有助于后续的实例重组。 2.模型实例重组：在此步骤中，对三维模型的位置、尺寸、材质、顶点信息、法相信息、面片信息字段进行分割，再利用Qwen-VL-Max和GroundingDino算法对分割后的部件进行组合，形成独立的模型实例，并获取其中的标签字段。标签字段能够使每个模型实例能够基于原模型的结构和信息进行识别和应用。 3.模型格式转换：本步骤将拆分获得的实例模型及其对应的材质信息转换为OpenUSD格式，并获取其中的碰撞体设置信息字段和动画约束信息字段，以使模型能够在场景中动起来。通过以上步骤，将原本数据库中的模型进行重组，生成新的实例模型，并被组装成一个完整的场景，以满足场景渲染、机器人训练等多个应用需求。

This dataset aims to address the generalization capability issue of computer vision models under varying viewpoints. By simulating the viewpoints of an agent freely roaming in an indoor environment (such as home service robots, drones, or AR/VR devices), it provides critical data for viewpoint-agnostic visual recognition and 3D scene understanding. The dataset is built upon a highly detailed indoor home decoration scene. Its core feature is randomized observation viewpoints: the camera height randomly ranges from 700mm to 1500mm (simulating observation heights from sitting to standing positions), while the camera pitch angle randomly varies within -45° to 45° (simulating viewing angles from looking up at the ceiling to looking down at the floor). This design breaks the constraints of fixed trajectories or fixed viewpoints, generating extremely diverse image compositions. It can capture areas that are invisible in fixed low-angle data, such as the tops of desktops, the interiors of cabinets, and the bottoms of chandeliers, thus constructing a dataset that is more inclusive of real-world visual variations. We selected an indoor home decoration scene to simulate roaming rendering with random heights and pitch angles. The rendering parameters are set as follows: resolution 1920×1080, FOV 60°, random camera height within 700mm to 1500mm, and random pitch angle within -45° to 45°. The dataset contains the following types of content: camera poses (intrinsic and extrinsic parameters), depth maps, COCO-format 2D image annotation information, normal maps in the camera coordinate system, rendered images, semantic maps, and albedo channel maps. This algorithm is designed to process 3D models, and generate new instance models through a series of steps including model segmentation, instance recombination and format conversion, which can be used for applications such as scene rendering and robot training. 1. Model Segmentation: This step takes any initial 3D model as input. The 3D model includes fields such as position, size, material, vertex information, normal information, and face information. A topological connectivity clustering algorithm is used to split the combined model into multiple face groups, and the model type field is obtained. This step effectively extracts the structural features of the model, facilitating subsequent instance recombination. 2. Model Instance Recombination: In this step, the fields of position, size, material, vertex information, normal information and face information of the 3D model are segmented, then the Qwen-VL-Max and GroundingDino algorithms are used to combine the segmented components to form independent model instances, and the label field therein is obtained. The label field enables each model instance to be identified and applied based on the structure and information of the original model. 3. Model Format Conversion: This step converts the split instance models and their corresponding material information into the OpenUSD format, and obtains the collider setting information field and animation constraint information field therein, so that the models can move in the scene. Through the above steps, the models originally in the database are recombined to generate new instance models, which are then assembled into a complete scene to meet multiple application requirements such as scene rendering and robot training.

提供机构：

杭州群核信息技术有限公司

创建时间：

2025-11-16

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是一个室内场景的合成视觉数据集，核心特点是模拟随机变化的相机视角，高度在700mm至1500mm之间、俯仰角在-45度到45度内随机分布，旨在增强计算机视觉模型在多变视角下的泛化能力。数据集包含465.23条数据，涵盖相机位姿、深度图、语义分割图、渲染图等多种视觉和标注信息，适用于家庭服务机器人、无人机或AR/VR设备的视觉识别与三维场景理解任务。

以上内容由遇见数据集搜集并总结生成