TIDIGITS

Name: TIDIGITS
Creator: Linguistic Data Consortium
Published: 2025-03-06 08:55:09
License: 暂无描述

DataCite Commons2025-03-06 更新2025-04-16 收录

下载链接：

https://catalog.ldc.upenn.edu/LDC93S10

下载链接

链接失效反馈

官方服务：

资源简介：

This corpus contains speech which was originally designed and collected at Texas Instruments, Inc. (TI) for the purpose of designing and evaluating algorithms for speaker-independent recognition of connected digit sequences. There are 326 speakers (111 men, 114 women, 50 boys and 51 girls) each pronouncing 77 digit sequences. Each speaker group is partitioned into test and training subsets. The corpus was collected at TI in 1982 in a quiet acoustic enclosure using an Electro-Voice RE-16 Dynamic Cardiod microphone, digitized at 20kHz. The waveform files are in the NIST SPHERE format.   Updates As of April, 2015, TIDIGITS is also available in flac compressed wav. This package is available to licensees as an additional download. Not included in this version are the folders relating to handling the shortened sphere files of the original corpus. Portions © 1993 Trustees of the University of Pennsylvania

提供机构：

Linguistic Data Consortium

创建时间：

2020-11-30

搜集汇总

数据集介绍

背景与挑战

背景概述

TIDIGITS是一个用于说话者无关连续数字序列识别算法设计和评估的语音数据集，包含326位说话者（分为男、女、男孩和女孩四组）的77个数字序列发音，分为测试和训练子集。数据于1982年在安静环境中采集，采样率20kHz，原始格式为NIST SPHERE，2015年起还提供了flac压缩wav格式。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集