面向空间智能应用的中小尺寸物体高精度数据

Name: 面向空间智能应用的中小尺寸物体高精度数据
Creator: 先临三维科技股份有限公司
Published: 2026-03-13 11:29:15
License: 暂无描述

浙江省数据知识产权登记平台2026-03-13 更新2026-03-14 收录

下载链接：

https://www.zjip.org.cn/home/announce/trends/8434394

下载链接

链接失效反馈

官方服务：

资源简介：

本数据面向三维视觉计算、数字孪生及智能制造等应用领域，内容形式与 Objaverse 类似，涵盖多类型、高精度三维对象的几何结构与纹理信息，可支撑多样化的算法研发与工程实践场景。依托该数据可开展三维重建、目标识别、语义与实例分割、姿态估计、形状检索等任务的模型训练与技术验证，促进 3D Gaussian Splatting、NeRF、三维生成模型等新型算法在复杂场景下的性能提升。同时，该数据亦可应用于虚拟现实/增强现实内容制作、虚拟场景搭建、工业设备数字化、智能机器人操作仿真等领域，有助于推动高精度三维内容理解、生成式三维建模以及数字资产生产在科研与产业中的落地。1）代表图片的获取：采集者用手机拍摄物体，并选取一张最能代表物体外观的视角图作为该样本的“场景参考图/代表图片”，用于快速浏览与后续描述生成。2）文本描述的生成：以代表图片作为输入，调用预训练多模态大模型 GPT-4 生成物体的结构化文本描述，描述内容聚焦类别、形状结构、颜色材质、纹理细节与显著外观特征，明确约束不输出任何个人身份相关信息，作为样本的语义描述字段。3）彩色图像、彩色深度数据与相机位姿的获取：采集设备在扫描过程中同步记录 RGB 纹理图与深度图，深度由结构光主动投影与多视几何匹配得到，并按帧序号与 RGB 一一对应保存；同时读取并保存纹理相机内参。相机位姿由基于序列数据的 SLAM/视觉里程计计算得到，输入为按时间对齐的彩色图像与相机内参，经过特征提取与匹配、帧间跟踪、关键帧优化与回环约束等步骤，输出每一帧相机到世界坐标系的外参矩阵，存放于`out/pose/`，并对异常帧进行剔除/平滑以保证位姿连续稳定。4）模型数据（真实网格）的生成：以多帧彩色图像、深度数据与对应相机位姿为主要输入，先进行多视角配准与融合（如基于位姿的点云对齐、位姿图优化、深度融合/体素融合），再执行几何滤波（离群点去除、噪声平滑）、孔洞修补与拓扑优化（非流形修复、网格连通性修正），并进行网格简化以在精度与规模间平衡；随后将多视角彩色纹理投影回网格进行颜色采样与融合，生成 UV/材质并进行纹理压缩，最终输出高精度网格与材质文件至 `out/obj/`，对应 `obj/mesh` 与 `mtl/texture` 等结果文件。上述生成流程仅面向物体几何与纹理重建，不包含对人物身份或个体特征的识别推断；若采集过程中出现背景敏感元素，将在采集与预处理阶段进行遮挡、去标识化或剔除处理后再入库。

This dataset targets application fields including 3D vision computing, digital twin and intelligent manufacturing. Similar to Objaverse in content form, it covers geometric structures and texture information of multi-type, high-precision 3D objects, and can support diverse algorithm research and development and engineering practice scenarios. Relying on this dataset, model training and technical verification for tasks such as 3D reconstruction, object recognition, semantic and instance segmentation, pose estimation, and shape retrieval can be carried out, which promotes the performance improvement of novel algorithms like 3D Gaussian Splatting, NeRF, and 3D generative models in complex scenarios. Meanwhile, this dataset can also be applied to fields such as virtual reality/augmented reality content production, virtual scene construction, industrial equipment digitization, and intelligent robot operation simulation, helping to promote the implementation of high-precision 3D content understanding, generative 3D modeling, and digital asset production in scientific research and industry. 1. Acquisition of representative images: Collectors take photos of objects with mobile phones, and select one perspective view that best represents the object's appearance as the "scene reference image/representative image" of the sample, which is used for quick browsing and subsequent description generation. 2. Generation of text descriptions: Take the representative image as input, and call the pretrained multimodal large model GPT-4 to generate structured text descriptions of the object. The description content focuses on category, shape structure, color and material, texture details and significant appearance features, with explicit constraints that no personal identity-related information shall be output, serving as the semantic description field of the sample. 3. Acquisition of color images, color depth data and camera poses: The acquisition equipment synchronously records RGB texture images and depth maps during the scanning process. The depth is obtained through active structured light projection and multi-view geometric matching, and is saved in one-to-one correspondence with RGB according to frame numbers; at the same time, the intrinsic parameters of the texture camera are read and saved. The camera pose is calculated by SLAM/visual odometry based on sequential data, with time-aligned color images and camera intrinsic parameters as inputs. After steps such as feature extraction and matching, inter-frame tracking, keyframe optimization and loop closure constraints, the extrinsic matrix of each frame's camera relative to the world coordinate system is output, stored in `out/pose/`, and abnormal frames are removed/smoothed to ensure continuous and stable poses. 4. Generation of model data (real meshes): Take multi-frame color images, depth data and corresponding camera poses as the main inputs. First, perform multi-view registration and fusion (such as pose-based point cloud alignment, pose graph optimization, depth fusion/voxel fusion), then perform geometric filtering (outlier removal, noise smoothing), hole filling and topology optimization (non-manifold repair, mesh connectivity correction), and conduct mesh simplification to balance between accuracy and scale; subsequently, project multi-view color textures back to the mesh for color sampling and fusion, generate UV coordinates/materials and perform texture compression, and finally output high-precision mesh and material files to `out/obj/`, corresponding to result files such as `obj/mesh` and `mtl/texture`. The above generation process is only for object geometry and texture reconstruction, and does not include recognition or inference of personal identities or individual characteristics; if background sensitive elements appear during the collection process, they will be occluded, de-identified or removed during the collection and preprocessing stages before being stored in the database.

提供机构：

先临三维科技股份有限公司

创建时间：

2025-12-04

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集专注于中小尺寸物体的高精度三维数据，提供类似Objaverse的几何结构和纹理信息，适用于三维视觉计算、数字孪生和智能制造等领域。它支持三维重建、目标识别、姿态估计等算法研发，并通过结构化流程生成数据，包括代表图片、GPT-4生成的文本描述以及基于多视角融合的高精度网格模型。数据集强调高精度和隐私保护，不包含个人身份信息，可促进虚拟现实、工业数字化等应用落地。

以上内容由遇见数据集搜集并总结生成