five

EPFL-VILAB/TST-ProcTHOR

收藏
Hugging Face2026-04-17 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/EPFL-VILAB/TST-ProcTHOR
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en tags: - multimodal-learning --- # Dataset Card for TST-ProcTHOR ## Dataset Description - **Homepage:** [https://tst-vision.epfl.ch](https://tst-vision.epfl.ch) - **Repository:** [TST official repository](https://github.com) - **Paper:** [Arxiv](https://arxiv.org) ### Dataset Summary This custom TST-ProcTHOR dataset is used in research work "Multimodality as Supervision: Self-Supervised Specialization to the Test Environment via Multimodality". - `pretrain/` is a multimodal pretraining dataset collected using ProcTHOR environment. It contains RGB images, and 9 additional tokenized modalities. - `segmentation/train` is the associated downstream dataset used to finetune TST pretrained models on semantic segmentation tasks. - `segmentation/test` contains the test dataset used for evaluation/testing on semantic segmentation task. This data corresponds to samples obtained from the test-space itself. - `captioning/train` is the associated downstream dataset used to finetune TST pretrained models on captioning task. - `captioning/test` contains the test dataset used for evaluation/testing on captioning task. This data corresponds to samples obtained from the test-space itself. ## Dataset Structure ```python TST-ProcTHOR/ ├── pretrain/ │ ├── test_spaces/ │ │ ├── crop_settings/ # Contains .tar shards │ │ ├── det/ # Contains .tar shards │ │ ├── rgb/ # Contains .tar shards │ │ ├── tok_canny_edge@224/ # Contains .tar shards │ │ ├── ... # More tokenized feature directories │ │ └── tok_semseg@224/ # Contains .tar shards │ └── transfer/ │ ├── crop_settings/ # Contains .tar shards │ ├── det/ # Contains .tar shards │ ├── rgb/ # Contains .tar shards │ ├── tok_canny_edge@224/ # Contains .tar shards │ ├── ... # More tokenized feature directories │ └── tok_semseg@224/ # Contains .tar shards ├── segmentation/ │ ├── train/ # Training data for segmentation │ └── test/ # Test data for segmentation ├── captioning/ │ ├── train/ # Training data for captioning │ └── test/ # Test data for captioning └── README.md ``` ## Dataset Creation It includes procedurally generated house-like environments. We use 5 procedurally generated houses as our test space. Dataset is collected by randomly sample various agent x, y, z positions and orientations along its axis in the test space, and collect RGB-D images at these points. ### Source Data Dataset is collected from ProcTHOR simulator. ### Citation Information ``` @inproceedings{singh2026tst, title={Multimodality as Supervision: Self-Supervised Specialization to the Test Environment via Multimodality}, author={Kunal Pratap Singh and Ali Garjani and Rishubh Singh and Muhammad Uzair Khattak and Efe Tarhan and Jason Toskov and Andrei Atanov and O{\u{g}}uzhan Fatih Kar and Amir Zamir}, booktitle={International Conference on Learning Representations (ICLR)}, year={2026} } ```
提供机构:
EPFL-VILAB
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作