five

TI 46-Word

收藏
DataCite Commons2025-02-10 更新2024-07-13 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC93S9
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>This release&nbsp;contains a corpus of speech which was originally designed and collected at Texas Instruments, Inc. (TI) in 1980 and used initially in performance assessment tests of isolated-word speaker-dependent technology. (See "Speech Recognition: Turning Theory to Practice" by G. R. Doddington and T. B. Schalk, in IEEE Spectrum, Vol. 18, No. 9, September 1981.)</p><br> <p>The 46-word vocabulary consists of two sub-vocabularies: (1) the TI 20-word vocabulary (consisting of the digits zero through nine plus the words "enter," "erase," "go," "help," "no," "rubout," "repeat," "stop," "start," and "yes" as well as (2) the TI 26-word "alphabet set" (consisting of the letters "a" through "z").</p><br> <h3>Data</h3><br> <p>The corpus contains read utterances from 16 speakers (eight males and eight females) each speaking 26 utterances of the 46-word vocabulary: 16 tokens designated as training and ten as test. Note these numbers reflect the aim of the collection and for various reasons, the full number of utterances was not reached for some speakers. See the included documentation for more information.</p><br> <p>The corpus was collected at Texas Instruments in a quiet acoustic enclosure using an Electro-Voice RE-16 Dynamic Cardiod microphone at 12.5kHz sample rate with 12-bit quantization. The files are in NIST SPHERE format and have a ".wav" filename extension.</p><br> <h3>Updates</h3><br> <p>As of October 5, 2016 the documentation was updated to more closely reflect the&nbsp;file inventory.</p></br>

<h3>引言</h3><br><p>本发布包包含一组语音语料库,该语料库于1980年由德州仪器公司(Texas Instruments, Inc.,简称TI)首次设计并采集,最初用于孤立词说话人依赖型技术的性能评估测试。(详见G. R. Doddington与T. B. Schalk合著的《语音识别:从理论到实践》,发表于《IEEE Spectrum》1981年9月第18卷第9期。)</p><br><p>该语料库采用46词词汇表,分为两个子词汇表:(1) TI 20词词汇表,涵盖数字0至9,以及"enter"、"erase"、"go"、"help"、"no"、"rubout"、"repeat"、"stop"、"start"、"yes";(2) TI 26词"字母集",涵盖字母a至z。</p><br><h3>数据</h3><br><p>该语料库包含16名说话人(8名男性、8名女性)的朗读语音片段,每位说话人需朗读46词词汇表中的26条语句:其中16条词元(Token)被指定为训练集,10条为测试集。需注意,上述数字仅为采集目标,受各类因素影响,部分说话人未完成全部语句的录制。详细信息请参阅附带的文档说明。</p><br><p>该语料库于德州仪器公司的静音声学环境中采集,使用Electro-Voice RE-16动圈心形指向麦克风,采样率为12.5kHz,采用12位量化。音频文件采用NIST SPHERE格式,文件扩展名为".wav"。</p><br><h3>更新</h3><br><p>截至2016年10月5日,文档已更新以更准确地反映文件清单情况。</p></br>
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
TI 46-Word是一个发布于1993年的英语语音识别数据集,包含超过5小时的孤立词语音,用于说话人依赖的语音识别性能评估。数据由16位说话人(8男8女)录制,词汇包括20个数字和命令词以及26个字母,采集于安静环境,采样率为12.5kHz、12位量化,文件格式为NIST SPHERE。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作