sayakpaul/nyu_depth_v2

Name: sayakpaul/nyu_depth_v2
Creator: sayakpaul
Published: 2022-12-12 13:35:31
License: 暂无描述

Hugging Face2022-12-12 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/sayakpaul/nyu_depth_v2

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 language: - en multilinguality: - monolingual size_categories: - 10K<n<100K task_categories: - depth-estimation task_ids: [] pretty_name: NYU Depth V2 tags: - depth-estimation paperswithcode_id: nyuv2 dataset_info: features: - name: image dtype: image - name: depth_map dtype: image splits: - name: train num_bytes: 20212097551 num_examples: 47584 - name: validation num_bytes: 240785762 num_examples: 654 download_size: 35151124480 dataset_size: 20452883313 --- # Dataset Card for NYU Depth V2 ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks](#supported-tasks) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Visualization](#visualization) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** [NYU Depth Dataset V2 homepage](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html) - **Repository:** Fast Depth [repository](https://github.com/dwofk/fast-depth) which was used to source the dataset in this repository. It is a preprocessed version of the original NYU Depth V2 dataset linked above. It is also used in [TensorFlow Datasets](https://www.tensorflow.org/datasets/catalog/nyu_depth_v2). - **Papers:** [Indoor Segmentation and Support Inference from RGBD Images](http://cs.nyu.edu/~silberman/papers/indoor_seg_support.pdf) and [FastDepth: Fast Monocular Depth Estimation on Embedded Systems](https://arxiv.org/abs/1903.03273) - **Point of Contact:** [Nathan Silberman](mailto:silberman@@cs.nyu.edu) and [Diana Wofk](mailto:dwofk@alum.mit.edu) ### Dataset Summary As per the [dataset homepage](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html): The NYU-Depth V2 data set is comprised of video sequences from a variety of indoor scenes as recorded by both the RGB and Depth cameras from the Microsoft [Kinect](http://www.xbox.com/kinect). It features: * 1449 densely labeled pairs of aligned RGB and depth images * 464 new scenes taken from 3 cities * 407,024 new unlabeled frames * Each object is labeled with a class and an instance number (cup1, cup2, cup3, etc) The dataset has several components: * Labeled: A subset of the video data accompanied by dense multi-class labels. This data has also been preprocessed to fill in missing depth labels. * Raw: The raw rgb, depth and accelerometer data as provided by the Kinect. * Toolbox: Useful functions for manipulating the data and labels. ### Supported Tasks - `depth-estimation`: Depth estimation is the task of approximating the perceived depth of a given image. In other words, it's about measuring the distance of each image pixel from the camera. - `semantic-segmentation`: Semantic segmentation is the task of associating every pixel of an image to a class label. There are other tasks supported by this dataset as well. You can find more about them by referring to [this resource](https://paperswithcode.com/dataset/nyuv2). ### Languages English. ## Dataset Structure ### Data Instances A data point comprises an image and its annotation depth map for both the `train` and `validation` splits. ``` { 'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB at 0x1FF32A3EDA0>, 'depth_map': <PIL.PngImagePlugin.PngImageFile image mode=L at 0x1FF32E5B978>, } ``` ### Data Fields - `image`: A `PIL.Image.Image` object containing the image. Note that when accessing the image column: `dataset[0]["image"]` the image file is automatically decoded. Decoding of a large number of image files might take a significant amount of time. Thus it is important to first query the sample index before the `"image"` column, *i.e.* `dataset[0]["image"]` should **always** be preferred over `dataset["image"][0]`. - `depth_map`: A `PIL.Image.Image` object containing the annotation depth map. ### Data Splits The data is split into training, and validation splits. The training data contains 47584 images, and the validation data contains 654 images. ## Visualization You can use the following code snippet to visualize samples from the dataset: ```py from datasets import load_dataset import numpy as np import matplotlib.pyplot as plt cmap = plt.cm.viridis ds = load_dataset("sayakpaul/nyu_depth_v2") def colored_depthmap(depth, d_min=None, d_max=None): if d_min is None: d_min = np.min(depth) if d_max is None: d_max = np.max(depth) depth_relative = (depth - d_min) / (d_max - d_min) return 255 * cmap(depth_relative)[:,:,:3] # H, W, C def merge_into_row(input, depth_target): input = np.array(input) depth_target = np.squeeze(np.array(depth_target)) d_min = np.min(depth_target) d_max = np.max(depth_target) depth_target_col = colored_depthmap(depth_target, d_min, d_max) img_merge = np.hstack([input, depth_target_col]) return img_merge random_indices = np.random.choice(len(ds["train"]), 9).tolist() train_set = ds["train"] plt.figure(figsize=(15, 6)) for i, idx in enumerate(random_indices): ax = plt.subplot(3, 3, i + 1) image_viz = merge_into_row( train_set[idx]["image"], train_set[idx]["depth_map"] ) plt.imshow(image_viz.astype("uint8")) plt.axis("off") ``` ## Dataset Creation ### Curation Rationale The rationale from [the paper](http://cs.nyu.edu/~silberman/papers/indoor_seg_support.pdf) that introduced the NYU Depth V2 dataset: > We present an approach to interpret the major surfaces, objects, and support relations of an indoor scene from an RGBD image. Most existing work ignores physical interactions or is applied only to tidy rooms and hallways. Our goal is to parse typical, often messy, indoor scenes into floor, walls, supporting surfaces, and object regions, and to recover support relationships. One of our main interests is to better understand how 3D cues can best inform a structured 3D interpretation. ### Source Data #### Initial Data Collection > The dataset consists of 1449 RGBD images, gathered from a wide range of commercial and residential buildings in three different US cities, comprising 464 different indoor scenes across 26 scene classes.A dense per-pixel labeling was obtained for each image using Amazon Mechanical Turk. ### Annotations #### Annotation process This is an involved process. Interested readers are referred to Sections 2, 3, and 4 of the [original paper](http://cs.nyu.edu/~silberman/papers/indoor_seg_support.pdf). #### Who are the annotators? AMT annotators. ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators * Original NYU Depth V2 dataset: Nathan Silberman, Derek Hoiem, Pushmeet Kohli, Rob Fergus * Preprocessed version: Diana Wofk, Fangchang Ma, Tien-Ju Yang, Sertac Karaman, Vivienne Sze ### Licensing Information The preprocessed NYU Depth V2 dataset is licensed under a [MIT License](https://github.com/dwofk/fast-depth/blob/master/LICENSE). ### Citation Information ```bibtex @inproceedings{Silberman:ECCV12, author = {Nathan Silberman, Derek Hoiem, Pushmeet Kohli and Rob Fergus}, title = {Indoor Segmentation and Support Inference from RGBD Images}, booktitle = {ECCV}, year = {2012} } @inproceedings{icra_2019_fastdepth, author = {{Wofk, Diana and Ma, Fangchang and Yang, Tien-Ju and Karaman, Sertac and Sze, Vivienne}}, title = {{FastDepth: Fast Monocular Depth Estimation on Embedded Systems}}, booktitle = {{IEEE International Conference on Robotics and Automation (ICRA)}}, year = {{2019}} } ``` ### Contributions Thanks to [@sayakpaul](https://huggingface.co/sayakpaul) for adding this dataset.

--- 许可证：Apache-2.0 语言： - 英语多语言属性： - 单语言样本规模类别： - 10000 < 样本数 < 100000 任务类别： - 深度估计（depth-estimation）任务子项：无友好名称：NYU Depth V2 标签： - 深度估计（depth-estimation） PapersWithCode 编号：nyuv2 数据集信息：特征： - 名称：图像（image），数据类型：图像 - 名称：深度图（depth_map），数据类型：图像数据划分： - 名称：训练集（train），字节数：20212097551，样本数：47584 - 名称：验证集（validation），字节数：240785762，样本数：654 下载大小：35151124480 数据集总大小：20452883313 --- # NYU深度数据集V2数据集卡片 ## 目录 - [目录](#table-of-contents) - [数据集描述](#dataset-description) - [数据集概述](#dataset-summary) - [支持任务](#supported-tasks) - [语言](#languages) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [数据划分](#data-splits) - [可视化](#visualization) - [数据集构建](#dataset-creation) - [构建初衷](#curation-rationale) - [源数据](#source-data) - [标注](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据集使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏差讨论](#discussion-of-biases) - [其他已知局限性](#other-known-limitations) - [附加信息](#additional-information) - [数据集维护者](#dataset-curators) - [许可信息](#licensing-information) - [引用信息](#citation-information) - [贡献](#contributions) ## 数据集描述 - **官网：** [NYU深度数据集V2官网](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html) - **代码仓库：** 本仓库使用的Fast Depth [代码仓库](https://github.com/dwofk/fast-depth)，其为上述原始NYU Depth V2数据集的预处理版本。该预处理数据集也被收录于[TensorFlow数据集（TensorFlow Datasets）](https://www.tensorflow.org/datasets/catalog/nyu_depth_v2)中。 - **相关论文：** [《从RGBD图像进行室内分割与支撑关系推断》](http://cs.nyu.edu/~silberman/papers/indoor_seg_support.pdf)与[《FastDepth：嵌入式系统上的快速单目深度估计》](https://arxiv.org/abs/1903.03273) - **联系人：** [Nathan Silberman](mailto:silberman@cs.nyu.edu)与[Diana Wofk](mailto:dwofk@alum.mit.edu) ### 数据集概述根据[数据集官网](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html)的说明，NYU-Depth V2数据集由多种室内场景的视频序列组成，这些序列由微软Kinect的RGB相机与深度相机共同录制。该数据集包含： * 1449组经过密集标注的对齐RGB图像与深度图像对 * 来自3座城市的464个全新室内场景 * 407024张未标注的原始帧 * 每个物体均被标注了类别与实例编号（如cup1、cup2、cup3等）该数据集包含多个组成部分： * 标注集：视频数据的子集，附带密集多类别标注，且已通过预处理填补了缺失的深度标注 * 原始集：Kinect提供的原始RGB、深度与加速度计数据 * 工具工具箱：用于处理数据与标注的实用函数集 ### 支持任务 - `深度估计（depth-estimation）`: 深度估计任务旨在估算给定图像的感知深度，换言之，即测算图像中每个像素与相机之间的距离。 - `语义分割（semantic-segmentation）`: 语义分割任务旨在将图像的每个像素关联至对应的类别标签。该数据集还支持其他任务，更多详情可参阅[该资源](https://paperswithcode.com/dataset/nyuv2)。 ### 语言英语。 ## 数据集结构 ### 数据实例无论是训练集还是验证集划分，每个数据样本均包含一幅图像与其对应的标注深度图。 { 'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB at 0x1FF32A3EDA0>, 'depth_map': <PIL.PngImagePlugin.PngImageFile image mode=L at 0x1FF32E5B978>, } ### 数据字段 - `图像（image）`: 一个`PIL.Image.Image`对象，存储图像数据。请注意，当访问图像列时，如`dataset[0]["image"]`，图像文件会自动解码。批量解码大量图像文件可能会耗费较多时间，因此建议始终优先使用`dataset[0]["image"]`而非`dataset["image"][0]`。 - `深度图（depth_map）`: 一个`PIL.Image.Image`对象，存储标注深度图数据。 ### 数据划分数据划分为训练集与验证集：训练集包含47584张图像，验证集包含654张图像。 ## 可视化你可以使用以下代码片段可视化数据集样本： py from datasets import load_dataset import numpy as np import matplotlib.pyplot as plt cmap = plt.cm.viridis ds = load_dataset("sayakpaul/nyu_depth_v2") def colored_depthmap(depth, d_min=None, d_max=None): if d_min is None: d_min = np.min(depth) if d_max is None: d_max = np.max(depth) depth_relative = (depth - d_min) / (d_max - d_min) return 255 * cmap(depth_relative)[:,:,:3] # H, W, C def merge_into_row(input, depth_target): input = np.array(input) depth_target = np.squeeze(np.array(depth_target)) d_min = np.min(depth_target) d_max = np.max(depth_target) depth_target_col = colored_depthmap(depth_target, d_min, d_max) img_merge = np.hstack([input, depth_target_col]) return img_merge random_indices = np.random.choice(len(ds["train"]), 9).tolist() train_set = ds["train"] plt.figure(figsize=(15, 6)) for i, idx in enumerate(random_indices): ax = plt.subplot(3, 3, i + 1) image_viz = merge_into_row( train_set[idx]["image"], train_set[idx]["depth_map"] ) plt.imshow(image_viz.astype("uint8")) plt.axis("off") ## 数据集构建 ### 构建初衷构建初衷引自提出NYU Depth V2数据集的[原始论文](http://cs.nyu.edu/~silberman/papers/indoor_seg_support.pdf)： > 我们提出了一种从RGBD图像中解析室内场景主要表面、物体与支撑关系的方法。现有多数研究要么忽略了物理交互关系，要么仅适用于整洁的房间与走廊场景。我们的目标是将典型且往往杂乱的室内场景解析为地面、墙壁、支撑面与物体区域，并恢复其支撑关系。我们的核心研究目标之一是深入理解三维线索如何最优地支撑结构化三维场景解读。 ### 源数据 #### 初始数据采集 > 该数据集包含1449张RGBD图像，采集自美国3座城市的大量商业与民用建筑，涵盖26个场景类别下的464个不同室内场景。所有图像均通过亚马逊众包平台Amazon Mechanical Turk完成了逐像素密集标注。 ### 标注 #### 标注流程该流程较为复杂，感兴趣的读者可参阅[原始论文](http://cs.nyu.edu/~silberman/papers/indoor_seg_support.pdf)的第2、3、4章节。 #### 标注人员 Amazon Mechanical Turk标注者（AMT标注者）。 ### 个人与敏感信息【需补充更多信息】 ## 数据集使用注意事项 ### 数据集的社会影响【需补充更多信息】 ### 偏差讨论【需补充更多信息】 ### 其他已知局限性【需补充更多信息】 ## 附加信息 ### 数据集维护者 * 原始NYU Depth V2数据集：Nathan Silberman、Derek Hoiem、Pushmeet Kohli、Rob Fergus * 预处理版本数据集：Diana Wofk、Fangchang Ma、Tien-Ju Yang、Sertac Karaman、Vivienne Sze ### 许可信息预处理后的NYU Depth V2数据集采用[MIT许可证](https://github.com/dwofk/fast-depth/blob/master/LICENSE)进行授权。 ### 引用信息 bibtex @inproceedings{Silberman:ECCV12, author = {Nathan Silberman, Derek Hoiem, Pushmeet Kohli and Rob Fergus}, title = {Indoor Segmentation and Support Inference from RGBD Images}, booktitle = {ECCV}, year = {2012} } @inproceedings{icra_2019_fastdepth, author = {{Wofk, Diana and Ma, Fangchang and Yang, Tien-Ju and Karaman, Sertac and Sze, Vivienne}}, title = {{FastDepth: Fast Monocular Depth Estimation on Embedded Systems}}, booktitle = {{IEEE International Conference on Robotics and Automation (ICRA)}}, year = {{2019}} } ### 贡献感谢[@sayakpaul](https://huggingface.co/sayakpaul)贡献本数据集的上传工作。

提供机构：

sayakpaul

原始信息汇总

数据集概述

数据集名称： NYU Depth V2

许可证： Apache-2.0

语言： 英语

多语言性： 单语种

大小分类： 10K<n<100K

任务分类： 深度估计

标签： 深度估计

数据集信息：

特征：
- image: 图像数据，数据类型为图像。
- depth_map: 深度图，数据类型为图像。
数据分割：
- train: 训练集，包含47584个样本，总大小为20212097551字节。
- validation: 验证集，包含654个样本，总大小为240785762字节。
下载大小： 35151124480字节
数据集大小： 20452883313字节

数据集结构

数据实例： 每个数据点包含一张图像及其对应的深度图。
数据字段：
- image: 图像数据，类型为PIL.Image.Image。
- depth_map: 深度图数据，类型为PIL.Image.Image。
数据分割： 数据集分为训练集和验证集。

支持的任务

depth-estimation: 深度估计，用于估计图像中每个像素的感知深度。
semantic-segmentation: 语义分割，将图像中的每个像素关联到特定的类别标签。

数据集创建

来源数据： 数据集包含1449对密集标记的对齐RGB和深度图像，来自三个不同城市的464个新场景。
注释过程： 使用Amazon Mechanical Turk进行密集的每像素标注。
注释者： AMT注释者。

许可证信息

预处理版本： MIT许可证。

引用信息

bibtex @inproceedings{Silberman:ECCV12, author = {Nathan Silberman, Derek Hoiem, Pushmeet Kohli and Rob Fergus}, title = {Indoor Segmentation and Support Inference from RGBD Images}, booktitle = {ECCV}, year = {2012} }

@inproceedings{icra_2019_fastdepth, author = {{Wofk, Diana and Ma, Fangchang and Yang, Tien-Ju and Karaman, Sertac and Sze, Vivienne}}, title = {{FastDepth: Fast Monocular Depth Estimation on Embedded Systems}}, booktitle = {{IEEE International Conference on Robotics and Automation (ICRA)}}, year = {{2019}} }

搜集汇总

数据集介绍

构建方式

NYU Depth V2数据集的构建基于Microsoft Kinect设备采集的RGB和深度图像序列，涵盖了多种室内场景。数据集包含1449对密集标注的RGB和深度图像，以及464个新场景和407,024个未标注帧。标注过程涉及使用Amazon Mechanical Turk进行密集的逐像素标注，确保每个对象都被赋予类别和实例编号。此外，数据集还提供了原始的RGB、深度和加速度计数据，以及用于数据处理的工具箱，从而为深度估计和语义分割任务提供了丰富的资源。

特点

NYU Depth V2数据集的主要特点在于其高密度的标注和多样化的室内场景覆盖。数据集不仅提供了高质量的RGB图像和深度图，还通过逐像素标注确保了每个对象的类别和实例信息。此外，数据集的预处理版本进一步填补了深度标签中的缺失值，增强了数据的一致性和可用性。这些特点使得该数据集成为深度估计和语义分割等任务的理想选择。

使用方法

使用NYU Depth V2数据集时，用户可以通过HuggingFace的datasets库轻松加载数据。数据集分为训练集和验证集，分别包含47584和654个样本。每个样本包含一个RGB图像和一个对应的深度图。用户可以通过访问'image'和'depth_map'字段来获取图像和深度信息。为了提高数据处理的效率，建议先查询样本索引再访问图像数据。此外，数据集还提供了可视化工具，帮助用户直观地理解数据内容。

背景与挑战

背景概述

NYU Depth V2数据集由纽约大学Silberman等人于2012年创建，旨在解决室内场景的深度估计和语义分割问题。该数据集包含了1449对密集标注的RGB和深度图像，涵盖了464个不同室内场景，广泛应用于计算机视觉领域。其核心研究问题是如何从RGBD图像中解析出室内场景的主要表面、物体和支持关系，以提高3D场景理解的准确性。该数据集的发布极大地推动了深度估计和语义分割技术的发展，成为相关研究的重要基准。

当前挑战

NYU Depth V2数据集在构建过程中面临多重挑战。首先，室内场景的复杂性和多样性使得深度估计和语义分割任务变得极为复杂。其次，数据集的标注过程依赖于Amazon Mechanical Turk，这可能导致标注质量的不一致性。此外，数据集的规模和多样性虽然丰富，但仍需进一步扩展以涵盖更多场景和光照条件，以提高模型的泛化能力。最后，数据集在处理缺失深度标签和噪声数据时也面临技术挑战，需要高效的预处理和后处理方法来保证数据质量。

常用场景

经典使用场景

在计算机视觉领域，NYU Depth V2数据集的经典使用场景主要集中在深度估计和语义分割任务上。该数据集通过提供密集标注的RGB图像及其对应的深度图，为研究人员提供了一个丰富的资源，用于训练和评估深度估计模型。这些模型能够从单张RGB图像中推断出场景的三维结构，从而在机器人导航、增强现实和自动驾驶等领域展现出巨大的应用潜力。

解决学术问题

NYU Depth V2数据集解决了室内场景深度估计和语义分割中的多个学术研究问题。通过提供高质量的标注数据，该数据集帮助研究人员克服了在复杂室内环境中进行精确深度估计的挑战。此外，数据集中的多类标签和实例编号为语义分割任务提供了丰富的训练样本，推动了相关算法的发展，提升了模型在复杂场景中的表现。

衍生相关工作

基于NYU Depth V2数据集，许多经典工作得以展开。例如，FastDepth模型利用该数据集进行训练，实现了在嵌入式系统上的快速单目深度估计，显著提升了深度学习模型在资源受限环境中的应用潜力。此外，该数据集还促进了室内场景分割和物体支持关系推断的研究，推动了计算机视觉在理解复杂室内环境方面的进步。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集