SPIRE-SIES
收藏arXiv2023-12-02 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2312.00698v1
下载链接
链接失效反馈官方服务:
资源简介:
SPIRE-SIES是由印度理工学院卡纳塔克邦分校等机构创建的一个包含170.83小时自发印度英语语音的数据集。该数据集通过网络应用收集,涵盖了12种主要印度方言,共有37505条样本。数据集的创建过程中,使用图像作为刺激来引发自发语音,确保了语音的丰富性和语义相关性。该数据集主要用于改进印度英语自动语音识别(ASR)系统,特别是针对自发语音的处理。
SPIRE-SIES is a dataset containing 170.83 hours of spontaneous Indian English speech, developed by the Indian Institute of Technology Karnataka and other institutions. This dataset, which was collected via web applications, covers 12 major Indian dialects and includes a total of 37,505 samples. During its creation, images were used as stimuli to elicit spontaneous speech, ensuring the richness and semantic relevance of the collected speech. This dataset is primarily designed to enhance Indian English automatic speech recognition (ASR) systems, especially for spontaneous speech processing.
提供机构:
印度理工学院卡纳塔克邦分校
创建时间:
2023-12-02



