---
license: apache-2.0
language:
- en
multilinguality:
- monolingual
size_categories:
- 10K<n<100K
task_categories:
- depth-estimation
task_ids: []
pretty_name: NYU Depth V2
tags:
- depth-estimation
paperswithcode_id: nyuv2
dataset_info:
features:
- name: image
dtype: image
- name: depth_map
dtype: image
splits:
- name: train
num_bytes: 20212097551
num_examples: 47584
- name: validation
num_bytes: 240785762
num_examples: 654
download_size: 35151124480
dataset_size: 20452883313
---
# Dataset Card for NYU Depth V2
## Table of Contents
- [Table of Contents](#table-of-contents)
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks](#supported-tasks)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Visualization](#visualization)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** [NYU Depth Dataset V2 homepage](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html)
- **Repository:** Fast Depth [repository](https://github.com/dwofk/fast-depth) which was used to source the dataset in this repository. It is a preprocessed version of the original NYU Depth V2 dataset linked above. It is also used in [TensorFlow Datasets](https://www.tensorflow.org/datasets/catalog/nyu_depth_v2).
- **Papers:** [Indoor Segmentation and Support Inference from RGBD Images](http://cs.nyu.edu/~silberman/papers/indoor_seg_support.pdf) and [FastDepth: Fast Monocular Depth Estimation on Embedded Systems](https://arxiv.org/abs/1903.03273)
- **Point of Contact:** [Nathan Silberman](mailto:silberman@@cs.nyu.edu) and [Diana Wofk](mailto:dwofk@alum.mit.edu)
### Dataset Summary
As per the [dataset homepage](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html):
The NYU-Depth V2 data set is comprised of video sequences from a variety of indoor scenes as recorded by both the RGB and Depth cameras from the Microsoft [Kinect](http://www.xbox.com/kinect). It features:
* 1449 densely labeled pairs of aligned RGB and depth images
* 464 new scenes taken from 3 cities
* 407,024 new unlabeled frames
* Each object is labeled with a class and an instance number (cup1, cup2, cup3, etc)
The dataset has several components:
* Labeled: A subset of the video data accompanied by dense multi-class labels. This data has also been preprocessed to fill in missing depth labels.
* Raw: The raw rgb, depth and accelerometer data as provided by the Kinect.
* Toolbox: Useful functions for manipulating the data and labels.
### Supported Tasks
- `depth-estimation`: Depth estimation is the task of approximating the perceived depth of a given image. In other words, it's about measuring the distance of each image pixel from the camera.
- `semantic-segmentation`: Semantic segmentation is the task of associating every pixel of an image to a class label.
There are other tasks supported by this dataset as well. You can find more about them by referring to [this resource](https://paperswithcode.com/dataset/nyuv2).
### Languages
English.
## Dataset Structure
### Data Instances
A data point comprises an image and its annotation depth map for both the `train` and `validation` splits.
```
{
'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB at 0x1FF32A3EDA0>,
'depth_map': <PIL.PngImagePlugin.PngImageFile image mode=L at 0x1FF32E5B978>,
}
```
### Data Fields
- `image`: A `PIL.Image.Image` object containing the image. Note that when accessing the image column: `dataset[0]["image"]` the image file is automatically decoded. Decoding of a large number of image files might take a significant amount of time. Thus it is important to first query the sample index before the `"image"` column, *i.e.* `dataset[0]["image"]` should **always** be preferred over `dataset["image"][0]`.
- `depth_map`: A `PIL.Image.Image` object containing the annotation depth map.
### Data Splits
The data is split into training, and validation splits. The training data contains 47584 images, and the validation data contains 654 images.
## Visualization
You can use the following code snippet to visualize samples from the dataset:
```py
from datasets import load_dataset
import numpy as np
import matplotlib.pyplot as plt
cmap = plt.cm.viridis
ds = load_dataset("sayakpaul/nyu_depth_v2")
def colored_depthmap(depth, d_min=None, d_max=None):
if d_min is None:
d_min = np.min(depth)
if d_max is None:
d_max = np.max(depth)
depth_relative = (depth - d_min) / (d_max - d_min)
return 255 * cmap(depth_relative)[:,:,:3] # H, W, C
def merge_into_row(input, depth_target):
input = np.array(input)
depth_target = np.squeeze(np.array(depth_target))
d_min = np.min(depth_target)
d_max = np.max(depth_target)
depth_target_col = colored_depthmap(depth_target, d_min, d_max)
img_merge = np.hstack([input, depth_target_col])
return img_merge
random_indices = np.random.choice(len(ds["train"]), 9).tolist()
train_set = ds["train"]
plt.figure(figsize=(15, 6))
for i, idx in enumerate(random_indices):
ax = plt.subplot(3, 3, i + 1)
image_viz = merge_into_row(
train_set[idx]["image"], train_set[idx]["depth_map"]
)
plt.imshow(image_viz.astype("uint8"))
plt.axis("off")
```
## Dataset Creation
### Curation Rationale
The rationale from [the paper](http://cs.nyu.edu/~silberman/papers/indoor_seg_support.pdf) that introduced the NYU Depth V2 dataset:
> We present an approach to interpret the major surfaces, objects, and support relations of an indoor scene from an RGBD image. Most existing work ignores physical interactions or is applied only to tidy rooms and hallways. Our goal is to parse typical, often messy, indoor scenes into floor, walls, supporting surfaces, and object regions, and to recover support relationships. One of our main interests is to better understand how 3D cues can best inform a structured 3D interpretation.
### Source Data
#### Initial Data Collection
> The dataset consists of 1449 RGBD images, gathered from a wide range
of commercial and residential buildings in three different US cities, comprising
464 different indoor scenes across 26 scene classes.A dense per-pixel labeling was
obtained for each image using Amazon Mechanical Turk.
### Annotations
#### Annotation process
This is an involved process. Interested readers are referred to Sections 2, 3, and 4 of the [original paper](http://cs.nyu.edu/~silberman/papers/indoor_seg_support.pdf).
#### Who are the annotators?
AMT annotators.
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
* Original NYU Depth V2 dataset: Nathan Silberman, Derek Hoiem, Pushmeet Kohli, Rob Fergus
* Preprocessed version: Diana Wofk, Fangchang Ma, Tien-Ju Yang, Sertac Karaman, Vivienne Sze
### Licensing Information
The preprocessed NYU Depth V2 dataset is licensed under a [MIT License](https://github.com/dwofk/fast-depth/blob/master/LICENSE).
### Citation Information
```bibtex
@inproceedings{Silberman:ECCV12,
author = {Nathan Silberman, Derek Hoiem, Pushmeet Kohli and Rob Fergus},
title = {Indoor Segmentation and Support Inference from RGBD Images},
booktitle = {ECCV},
year = {2012}
}
@inproceedings{icra_2019_fastdepth,
author = {{Wofk, Diana and Ma, Fangchang and Yang, Tien-Ju and Karaman, Sertac and Sze, Vivienne}},
title = {{FastDepth: Fast Monocular Depth Estimation on Embedded Systems}},
booktitle = {{IEEE International Conference on Robotics and Automation (ICRA)}},
year = {{2019}}
}
```
### Contributions
Thanks to [@sayakpaul](https://huggingface.co/sayakpaul) for adding this dataset.
---
许可证:Apache-2.0
语言:
- 英语
多语言属性:
- 单语言
样本规模类别:
- 10000 < 样本数 < 100000
任务类别:
- 深度估计(depth-estimation)
任务子项:无
友好名称:NYU Depth V2
标签:
- 深度估计(depth-estimation)
PapersWithCode 编号:nyuv2
数据集信息:
特征:
- 名称:图像(image),数据类型:图像
- 名称:深度图(depth_map),数据类型:图像
数据划分:
- 名称:训练集(train),字节数:20212097551,样本数:47584
- 名称:验证集(validation),字节数:240785762,样本数:654
下载大小:35151124480
数据集总大小:20452883313
---
# NYU深度数据集V2数据集卡片
## 目录
- [目录](#table-of-contents)
- [数据集描述](#dataset-description)
- [数据集概述](#dataset-summary)
- [支持任务](#supported-tasks)
- [语言](#languages)
- [数据集结构](#dataset-structure)
- [数据实例](#data-instances)
- [数据字段](#data-fields)
- [数据划分](#data-splits)
- [可视化](#visualization)
- [数据集构建](#dataset-creation)
- [构建初衷](#curation-rationale)
- [源数据](#source-data)
- [标注](#annotations)
- [个人与敏感信息](#personal-and-sensitive-information)
- [数据集使用注意事项](#considerations-for-using-the-data)
- [数据集的社会影响](#social-impact-of-dataset)
- [偏差讨论](#discussion-of-biases)
- [其他已知局限性](#other-known-limitations)
- [附加信息](#additional-information)
- [数据集维护者](#dataset-curators)
- [许可信息](#licensing-information)
- [引用信息](#citation-information)
- [贡献](#contributions)
## 数据集描述
- **官网:** [NYU深度数据集V2官网](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html)
- **代码仓库:** 本仓库使用的Fast Depth [代码仓库](https://github.com/dwofk/fast-depth),其为上述原始NYU Depth V2数据集的预处理版本。该预处理数据集也被收录于[TensorFlow数据集(TensorFlow Datasets)](https://www.tensorflow.org/datasets/catalog/nyu_depth_v2)中。
- **相关论文:** [《从RGBD图像进行室内分割与支撑关系推断》](http://cs.nyu.edu/~silberman/papers/indoor_seg_support.pdf)与[《FastDepth:嵌入式系统上的快速单目深度估计》](https://arxiv.org/abs/1903.03273)
- **联系人:** [Nathan Silberman](mailto:silberman@cs.nyu.edu)与[Diana Wofk](mailto:dwofk@alum.mit.edu)
### 数据集概述
根据[数据集官网](https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html)的说明,NYU-Depth V2数据集由多种室内场景的视频序列组成,这些序列由微软Kinect的RGB相机与深度相机共同录制。该数据集包含:
* 1449组经过密集标注的对齐RGB图像与深度图像对
* 来自3座城市的464个全新室内场景
* 407024张未标注的原始帧
* 每个物体均被标注了类别与实例编号(如cup1、cup2、cup3等)
该数据集包含多个组成部分:
* 标注集:视频数据的子集,附带密集多类别标注,且已通过预处理填补了缺失的深度标注
* 原始集:Kinect提供的原始RGB、深度与加速度计数据
* 工具工具箱:用于处理数据与标注的实用函数集
### 支持任务
- `深度估计(depth-estimation)`: 深度估计任务旨在估算给定图像的感知深度,换言之,即测算图像中每个像素与相机之间的距离。
- `语义分割(semantic-segmentation)`: 语义分割任务旨在将图像的每个像素关联至对应的类别标签。
该数据集还支持其他任务,更多详情可参阅[该资源](https://paperswithcode.com/dataset/nyuv2)。
### 语言
英语。
## 数据集结构
### 数据实例
无论是训练集还是验证集划分,每个数据样本均包含一幅图像与其对应的标注深度图。
{
'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB at 0x1FF32A3EDA0>,
'depth_map': <PIL.PngImagePlugin.PngImageFile image mode=L at 0x1FF32E5B978>,
}
### 数据字段
- `图像(image)`: 一个`PIL.Image.Image`对象,存储图像数据。请注意,当访问图像列时,如`dataset[0]["image"]`,图像文件会自动解码。批量解码大量图像文件可能会耗费较多时间,因此建议始终优先使用`dataset[0]["image"]`而非`dataset["image"][0]`。
- `深度图(depth_map)`: 一个`PIL.Image.Image`对象,存储标注深度图数据。
### 数据划分
数据划分为训练集与验证集:训练集包含47584张图像,验证集包含654张图像。
## 可视化
你可以使用以下代码片段可视化数据集样本:
py
from datasets import load_dataset
import numpy as np
import matplotlib.pyplot as plt
cmap = plt.cm.viridis
ds = load_dataset("sayakpaul/nyu_depth_v2")
def colored_depthmap(depth, d_min=None, d_max=None):
if d_min is None:
d_min = np.min(depth)
if d_max is None:
d_max = np.max(depth)
depth_relative = (depth - d_min) / (d_max - d_min)
return 255 * cmap(depth_relative)[:,:,:3] # H, W, C
def merge_into_row(input, depth_target):
input = np.array(input)
depth_target = np.squeeze(np.array(depth_target))
d_min = np.min(depth_target)
d_max = np.max(depth_target)
depth_target_col = colored_depthmap(depth_target, d_min, d_max)
img_merge = np.hstack([input, depth_target_col])
return img_merge
random_indices = np.random.choice(len(ds["train"]), 9).tolist()
train_set = ds["train"]
plt.figure(figsize=(15, 6))
for i, idx in enumerate(random_indices):
ax = plt.subplot(3, 3, i + 1)
image_viz = merge_into_row(
train_set[idx]["image"], train_set[idx]["depth_map"]
)
plt.imshow(image_viz.astype("uint8"))
plt.axis("off")
## 数据集构建
### 构建初衷
构建初衷引自提出NYU Depth V2数据集的[原始论文](http://cs.nyu.edu/~silberman/papers/indoor_seg_support.pdf):
> 我们提出了一种从RGBD图像中解析室内场景主要表面、物体与支撑关系的方法。现有多数研究要么忽略了物理交互关系,要么仅适用于整洁的房间与走廊场景。我们的目标是将典型且往往杂乱的室内场景解析为地面、墙壁、支撑面与物体区域,并恢复其支撑关系。我们的核心研究目标之一是深入理解三维线索如何最优地支撑结构化三维场景解读。
### 源数据
#### 初始数据采集
> 该数据集包含1449张RGBD图像,采集自美国3座城市的大量商业与民用建筑,涵盖26个场景类别下的464个不同室内场景。所有图像均通过亚马逊众包平台Amazon Mechanical Turk完成了逐像素密集标注。
### 标注
#### 标注流程
该流程较为复杂,感兴趣的读者可参阅[原始论文](http://cs.nyu.edu/~silberman/papers/indoor_seg_support.pdf)的第2、3、4章节。
#### 标注人员
Amazon Mechanical Turk标注者(AMT标注者)。
### 个人与敏感信息
【需补充更多信息】
## 数据集使用注意事项
### 数据集的社会影响
【需补充更多信息】
### 偏差讨论
【需补充更多信息】
### 其他已知局限性
【需补充更多信息】
## 附加信息
### 数据集维护者
* 原始NYU Depth V2数据集:Nathan Silberman、Derek Hoiem、Pushmeet Kohli、Rob Fergus
* 预处理版本数据集:Diana Wofk、Fangchang Ma、Tien-Ju Yang、Sertac Karaman、Vivienne Sze
### 许可信息
预处理后的NYU Depth V2数据集采用[MIT许可证](https://github.com/dwofk/fast-depth/blob/master/LICENSE)进行授权。
### 引用信息
bibtex
@inproceedings{Silberman:ECCV12,
author = {Nathan Silberman, Derek Hoiem, Pushmeet Kohli and Rob Fergus},
title = {Indoor Segmentation and Support Inference from RGBD Images},
booktitle = {ECCV},
year = {2012}
}
@inproceedings{icra_2019_fastdepth,
author = {{Wofk, Diana and Ma, Fangchang and Yang, Tien-Ju and Karaman, Sertac and Sze, Vivienne}},
title = {{FastDepth: Fast Monocular Depth Estimation on Embedded Systems}},
booktitle = {{IEEE International Conference on Robotics and Automation (ICRA)}},
year = {{2019}}
}
### 贡献
感谢[@sayakpaul](https://huggingface.co/sayakpaul)贡献本数据集的上传工作。