Multimodal Object Dataset
收藏arXiv2025-09-30 收录
下载链接:
http://hp.naka-lab.org/subpages/mod165.html
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个开放的数据库,包含了视觉、触觉和听觉等多感官数据,同时还包括了多模态共存线索。该数据集由单一说话者在消音室内录制的75句日语句子组成,语音经过处理以去除静默间隔。数据集涵盖了24个物体,这些物体被分为七个类别。其任务是通过音素和词汇发现来进行语音单元的发现。
This dataset is an open database encompassing multi-sensory data including visual, tactile, and auditory modalities, alongside co-occurring multimodal cues. It comprises 75 Japanese utterances recorded by a single speaker in an anechoic chamber, with the speech processed to remove silent intervals. The dataset covers 24 objects categorized into seven categories. The task of this dataset is speech unit discovery through phoneme and lexical discovery.



