SPEECH-COCO
收藏DataCite Commons2020-11-10 更新2024-07-13 收录
下载链接:
https://perscido.univ-grenoble-alpes.fr/datasets/DS80
下载链接
链接失效反馈官方服务:
资源简介:
SPEECH-COCO is an augmentation of MS-COCO dataset where speech is added to image and text. Speech captions were generated using text-to-speech (TTS) synthesis resulting in 616,767 spoken captions (>600h) paired with images. Disfluencies and speed perturbation were added to the signal in order to sound more natural. Each speech signal (WAV) is paired with a JSON file containing exact timecode for each word/syllable/phoneme in the spoken caption. Such a corpus could be used for Language and Vision (LaVi) tasks including speech input or output instead of text.
提供机构:
PerSciDo
创建时间:
2017-07-11



