Name: EPFL-VILAB/TST-ProcTHOR
Creator: EPFL-VILAB
Published: 2026-04-17 15:05:24
License: 暂无描述

下载链接：

https://hf-mirror.com/datasets/EPFL-VILAB/TST-ProcTHOR

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en tags: - multimodal-learning --- # Dataset Card for TST-ProcTHOR ## Dataset Description - **Homepage:** [https://tst-vision.epfl.ch](https://tst-vision.epfl.ch) - **Repository:** [TST official repository](https://github.com) - **Paper:** [Arxiv](https://arxiv.org) ### Dataset Summary This custom TST-ProcTHOR dataset is used in research work "Multimodality as Supervision: Self-Supervised Specialization to the Test Environment via Multimodality". - `pretrain/` is a multimodal pretraining dataset collected using ProcTHOR environment. It contains RGB images, and 9 additional tokenized modalities. - `segmentation/train` is the associated downstream dataset used to finetune TST pretrained models on semantic segmentation tasks. - `segmentation/test` contains the test dataset used for evaluation/testing on semantic segmentation task. This data corresponds to samples obtained from the test-space itself. - `captioning/train` is the associated downstream dataset used to finetune TST pretrained models on captioning task. - `captioning/test` contains the test dataset used for evaluation/testing on captioning task. This data corresponds to samples obtained from the test-space itself. ## Dataset Structure ```python TST-ProcTHOR/ ├── pretrain/ │ ├── test_spaces/ │ │ ├── crop_settings/ # Contains .tar shards │ │ ├── det/ # Contains .tar shards │ │ ├── rgb/ # Contains .tar shards │ │ ├── tok_canny_edge@224/ # Contains .tar shards │ │ ├── ... # More tokenized feature directories │ │ └── tok_semseg@224/ # Contains .tar shards │ └── transfer/ │ ├── crop_settings/ # Contains .tar shards │ ├── det/ # Contains .tar shards │ ├── rgb/ # Contains .tar shards │ ├── tok_canny_edge@224/ # Contains .tar shards │ ├── ... # More tokenized feature directories │ └── tok_semseg@224/ # Contains .tar shards ├── segmentation/ │ ├── train/ # Training data for segmentation │ └── test/ # Test data for segmentation ├── captioning/ │ ├── train/ # Training data for captioning │ └── test/ # Test data for captioning └── README.md ``` ## Dataset Creation It includes procedurally generated house-like environments. We use 5 procedurally generated houses as our test space. Dataset is collected by randomly sample various agent x, y, z positions and orientations along its axis in the test space, and collect RGB-D images at these points. ### Source Data Dataset is collected from ProcTHOR simulator. ### Citation Information ``` @inproceedings{singh2026tst, title={Multimodality as Supervision: Self-Supervised Specialization to the Test Environment via Multimodality}, author={Kunal Pratap Singh and Ali Garjani and Rishubh Singh and Muhammad Uzair Khattak and Efe Tarhan and Jason Toskov and Andrei Atanov and O{\u{g}}uzhan Fatih Kar and Amir Zamir}, booktitle={International Conference on Learning Representations (ICLR)}, year={2026} } ```

应用场景：