EPFL-VILAB/TST-ProcTHOR
收藏Hugging Face2026-04-17 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/EPFL-VILAB/TST-ProcTHOR
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
tags:
- multimodal-learning
---
# Dataset Card for TST-ProcTHOR
## Dataset Description
- **Homepage:** [https://tst-vision.epfl.ch](https://tst-vision.epfl.ch)
- **Repository:** [TST official repository](https://github.com)
- **Paper:** [Arxiv](https://arxiv.org)
### Dataset Summary
This custom TST-ProcTHOR dataset is used in research work "Multimodality as Supervision: Self-Supervised Specialization to the Test Environment via Multimodality".
- `pretrain/` is a multimodal pretraining dataset collected using ProcTHOR environment. It contains RGB images, and 9 additional tokenized modalities.
- `segmentation/train` is the associated downstream dataset used to finetune TST pretrained models on semantic segmentation tasks.
- `segmentation/test` contains the test dataset used for evaluation/testing on semantic segmentation task. This data corresponds to samples obtained from the test-space itself.
- `captioning/train` is the associated downstream dataset used to finetune TST pretrained models on captioning task.
- `captioning/test` contains the test dataset used for evaluation/testing on captioning task. This data corresponds to samples obtained from the test-space itself.
## Dataset Structure
```python
TST-ProcTHOR/
├── pretrain/
│ ├── test_spaces/
│ │ ├── crop_settings/ # Contains .tar shards
│ │ ├── det/ # Contains .tar shards
│ │ ├── rgb/ # Contains .tar shards
│ │ ├── tok_canny_edge@224/ # Contains .tar shards
│ │ ├── ... # More tokenized feature directories
│ │ └── tok_semseg@224/ # Contains .tar shards
│ └── transfer/
│ ├── crop_settings/ # Contains .tar shards
│ ├── det/ # Contains .tar shards
│ ├── rgb/ # Contains .tar shards
│ ├── tok_canny_edge@224/ # Contains .tar shards
│ ├── ... # More tokenized feature directories
│ └── tok_semseg@224/ # Contains .tar shards
├── segmentation/
│ ├── train/ # Training data for segmentation
│ └── test/ # Test data for segmentation
├── captioning/
│ ├── train/ # Training data for captioning
│ └── test/ # Test data for captioning
└── README.md
```
## Dataset Creation
It includes procedurally generated
house-like environments. We use 5 procedurally generated
houses as our test space. Dataset is collected by randomly sample various agent
x, y, z positions and orientations along its axis in the test
space, and collect RGB-D images at these points.
### Source Data
Dataset is collected from ProcTHOR simulator.
### Citation Information
```
@inproceedings{singh2026tst,
title={Multimodality as Supervision: Self-Supervised Specialization to the Test Environment via Multimodality},
author={Kunal Pratap Singh and Ali Garjani and Rishubh Singh and Muhammad Uzair Khattak and Efe Tarhan and Jason Toskov and Andrei Atanov and O{\u{g}}uzhan Fatih Kar and Amir Zamir},
booktitle={International Conference on Learning Representations (ICLR)},
year={2026}
}
```
提供机构:
EPFL-VILAB



