TI 46-Word

Name: TI 46-Word
Creator: Linguistic Data Consortium
Published: 2025-02-10 08:55:15
License: 暂无描述

DataCite Commons2025-02-10 更新2024-07-13 收录

下载链接：

https://catalog.ldc.upenn.edu/LDC93S9

下载链接

链接失效反馈

官方服务：

资源简介：

<h3>Introduction</h3><br> <p>This release contains a corpus of speech which was originally designed and collected at Texas Instruments, Inc. (TI) in 1980 and used initially in performance assessment tests of isolated-word speaker-dependent technology. (See "Speech Recognition: Turning Theory to Practice" by G. R. Doddington and T. B. Schalk, in IEEE Spectrum, Vol. 18, No. 9, September 1981.)</p><br> <p>The 46-word vocabulary consists of two sub-vocabularies: (1) the TI 20-word vocabulary (consisting of the digits zero through nine plus the words "enter," "erase," "go," "help," "no," "rubout," "repeat," "stop," "start," and "yes" as well as (2) the TI 26-word "alphabet set" (consisting of the letters "a" through "z").</p><br> <h3>Data</h3><br> <p>The corpus contains read utterances from 16 speakers (eight males and eight females) each speaking 26 utterances of the 46-word vocabulary: 16 tokens designated as training and ten as test. Note these numbers reflect the aim of the collection and for various reasons, the full number of utterances was not reached for some speakers. See the included documentation for more information.</p><br> <p>The corpus was collected at Texas Instruments in a quiet acoustic enclosure using an Electro-Voice RE-16 Dynamic Cardiod microphone at 12.5kHz sample rate with 12-bit quantization. The files are in NIST SPHERE format and have a ".wav" filename extension.</p><br> <h3>Updates</h3><br> <p>As of October 5, 2016 the documentation was updated to more closely reflect the file inventory.</p></br>

<h3>引言</h3><br><p>本发布包包含一组语音语料库，该语料库于1980年由德州仪器公司（Texas Instruments, Inc.，简称TI）首次设计并采集，最初用于孤立词说话人依赖型技术的性能评估测试。（详见G. R. Doddington与T. B. Schalk合著的《语音识别：从理论到实践》，发表于《IEEE Spectrum》1981年9月第18卷第9期。）</p><br><p>该语料库采用46词词汇表，分为两个子词汇表：(1) TI 20词词汇表，涵盖数字0至9，以及"enter"、"erase"、"go"、"help"、"no"、"rubout"、"repeat"、"stop"、"start"、"yes"；(2) TI 26词"字母集"，涵盖字母a至z。</p><br><h3>数据</h3><br><p>该语料库包含16名说话人（8名男性、8名女性）的朗读语音片段，每位说话人需朗读46词词汇表中的26条语句：其中16条词元（Token）被指定为训练集，10条为测试集。需注意，上述数字仅为采集目标，受各类因素影响，部分说话人未完成全部语句的录制。详细信息请参阅附带的文档说明。</p><br><p>该语料库于德州仪器公司的静音声学环境中采集，使用Electro-Voice RE-16动圈心形指向麦克风，采样率为12.5kHz，采用12位量化。音频文件采用NIST SPHERE格式，文件扩展名为".wav"。</p><br><h3>更新</h3><br><p>截至2016年10月5日，文档已更新以更准确地反映文件清单情况。</p></br>

提供机构：

Linguistic Data Consortium

创建时间：

2020-11-30

搜集汇总

数据集介绍