CSR-IV HUB3
收藏NIAID Data Ecosystem2026-03-09 收录
下载链接:
https://doi.org/10.7910/DVN/DACJZB
下载链接
链接失效反馈官方服务:
资源简介:
This set of CD-ROMs contains all of the speech data provided to sites participating in the DARPA CSR November 1995 HUB3 Multi-Microphone tests. The data consists of digitized waveforms collected with eight different microphones simultaneously from 40 subjects reading 15 sentence articles drawn from various North American business news publications. The data is partitioned into development-test and evaluation-test sets. The test sets were collected with different subjects, prompts and microphones. No training data was collected for this corpus since a substantial amount of NAB acoustic training data was already available. Index files have been included that specify the exact subset of the evaluation test recordings which were used in the November 1995 tests. The software NIST used to process and score the output of the tests systems is also included. The data is organized as follows: CD26-3 Development-Test Data-Location 1, Adaptation and NAB recordings, Subjects:703-705, 707-70a, 70c, 70f, 70g CD26-4 Development-Test Data-Location 2, NAB recordings, Subjects:70k, 70m, 70o, 70q-70s, 70u-70w CD26-5 Development-Test Data-Location 2, Adaptation recordings, Subjects:70k 70m-70o, 70q-70s, 70u-70w CD26-3 Development-Test Data-NAB recordings, Subjects:710-71j As of September, 2007 this publication has been condensed to fit on a single DVD. The data on each CD resides in its own directory labeled with the above NIST labels.
本套光盘包含向参与1995年11月美国国防高级研究计划局(DARPA)CSR HUB3多麦克风测试的站点提供的全部语音数据。该数据集由8台不同麦克风同步采集的数字化波形构成,采集对象为40名受试者,他们朗读了取自北美多家商业新闻出版物的15篇语句短文。本数据集被划分为开发测试集与评估测试集,两类测试集的采集对象、提示文本与所用麦克风均不相同。由于已有大量NAB声学训练数据可用,因此该语料库未额外采集训练数据。本次发布附带索引文件,可精准指定1995年11月测试中实际使用的评估测试录音子集。此外还附带了美国国家标准与技术研究院(NIST)用于处理并评分测试系统输出结果的软件。
数据集的组织形式如下:
CD26-3:开发测试数据——地点1,适配录音与NAB录音,受试者编号:703-705、707-70a、70c、70f、70g
CD26-4:开发测试数据——地点2,NAB录音,受试者编号:70k、70m、70o、70q-70s、70u-70w
CD26-5:开发测试数据——地点2,适配录音,受试者编号:70k、70m-70o、70q-70s、70u-70w
CD26-3:开发测试数据——NAB录音,受试者编号:710-71j
截至2007年9月,该数据集已精简至可容纳于单张DVD光盘中。每张光盘中的数据均存储于以上述NIST标识命名的独立目录下。
创建时间:
2016-08-02



