SPEECH-COCO

Name: SPEECH-COCO
Creator: PerSciDo
Published: 2020-11-10 16:20:05
License: 暂无描述

DataCite Commons2020-11-10 更新2024-07-13 收录

下载链接：

https://perscido.univ-grenoble-alpes.fr/datasets/DS80

下载链接

链接失效反馈

官方服务：

资源简介：

SPEECH-COCO is an augmentation of MS-COCO dataset where speech is added to image and text. Speech captions were generated using text-to-speech (TTS) synthesis resulting in 616,767 spoken captions (>600h) paired with images. Disfluencies and speed perturbation were added to the signal in order to sound more natural. Each speech signal (WAV) is paired with a JSON file containing exact timecode for each word/syllable/phoneme in the spoken caption. Such a corpus could be used for Language and Vision (LaVi) tasks including speech input or output instead of text.

提供机构：

PerSciDo

创建时间：

2017-07-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集