five

RM Isolated and Spelled Word Data

收藏
DataCite Commons2021-07-01 更新2024-07-13 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC96S39
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>This release contains previously unreleased isolated-word and spell-mode (spelled out words) speech data from the (D)ARPA Resource Management (RM1) Corpus. This data is based on a 600-word subset of the 991-word RM1 vocabulary and contains spoken and spelled words pertaining to the RM1 naval resource management task. This corpus was collected simultaneously as part of the RM1 Continuous Speech Corpus (NIST Speech Discs 2-1-2-4) and contains speech from the same sets of subjects used in RM1.</p><br> <h3>Data</h3><br> <p>The speech data has been segmented into separate spelled and spoken-word waveform files for each subject-word utterance. Time-aligned word and phonetic transcriptions have been generated automatically using forced recognition and are included. The time-aligned transcriptions employ the same format and phone set as the TIMIT Acoustic-Phonetic Continuous Speech Corpus (NIST Speech Disc 1-1). See the TIMIT CD-ROM companion booklet, NISTIR 4930, pp. 29-31, for a description of the phone set.</p><br> <p>As with the continuous speech portion of RM1, this data is subsetted into speaker-independent and speaker-dependent partitions. These data sets are further partioned into training, development-test and evaluation-test subsets. See the "readme.doc" file in the top-level directory for more information about the data.</p><br> <p>Texas Instruments recruited the subjects and collected the speech. The National Institute of Standards and Technology (NIST) segmented the waveforms, generated the time-aligned transcriptions and produced this release.</p><br> <h3>Updates</h3><br> <p>RM Isolated and Spelled Word Data is no longer available as catalog number LDC96S39; it has been incorporated into Resource Management RM1 2.0, and it is currently available in both Resource Management RM1 2.0 (<a href="http://catalog.ldc.upenn.edu/LDC93S3B" rel="nofollow">LDC93S3B</a>), and Resource Management Complete Set 2.0 (<a href="http://catalog.ldc.upenn.edu/LDC93S3A" rel="nofollow">LDC93S3A</a>).</p></br>
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作