Resource Management RM1 2.0

Name: Resource Management RM1 2.0
Creator: Linguistic Data Consortium
Published: 2021-07-01 16:34:32
License: 暂无描述

DataCite Commons2021-07-01 更新2025-04-16 收录

下载链接：

https://catalog.ldc.upenn.edu/LDC93S3B

下载链接

链接失效反馈

官方服务：

资源简介：

<a href="http://catalog.ldc.upenn.edu/LDC93S3A" rel="nofollow">LDC93S3A</a> - Resource Management Complete Set 2.0 LDC93S3B - Resource Management (RM1) 2.0 <a href="http://catalog.ldc.upenn.edu/LDC93S3C" rel="nofollow">LDC93S3C</a> - Resource Management (RM2) 2.0 The DARPA Resource Management corpora (RM) consist of digitized and transcribed speech for use in designing and evaluating continuous speech recognition systems. There are two main parts, often referred to as RM1 and RM2. RM1 contains three sections, Speaker-Dependent (SD) training data, Speaker-Independent (SI) training data and test and evaluation data. RM2 has an additional and larger SD data set, including test material. Resource Management Complete Set 2.0 contains RM1 and RM2. All RM material consists of read sentences modeled after a naval resource management task. The complete corpus contains over 25,000 utterances from more than 160 speakers representing a variety of American dialects. The material was recorded at 16KHz, with 16-bit resolution, using a Sennheiser HMD-414 headset microphone. Resource Managment SD and SI Training and Test Data (RM1) The Speaker-Dependent (SD) Training Data contains 12 subjects, each reading a set of 600 "training sentences," two "dialect" sentences and ten "rapid adaptation" sentences, for a total of 7,344 recorded sentence utterances. The 600 sentences designated as training cover 97 of the lexical items in the corpus. The Speaker-Independent (SI) Training Data contains 80 speakers, each reading two "dialect" sentences plus 40 sentences from the Resource Management text corpus, for a total of 3,360 recorded sentence utterances. Any given sentence from a set of 1,600 Resource Management sentence texts was recorded by two subjects, while no sentence was read twice by the same subject. RM1 contains all SD and SI system test material used in 5 DARPA benchmark tests conducted in March and October of 1987, June 1988 and February and October 1989, along with scoring and diagnostic software and documentation for those tests. Documentation is also provided outlining use of the Resource Management training and test material at CMU in development of the SPHINX system. Example output and scored results for state-of-the-art speaker-dependent and speaker-independent systems (i.e. the BBN BYBLOS and CMU SPHINX systems) for the October 1989 benchmark tests are included. Extended Resource Management Speaker-Dependent Corpus (RM2) This set forms a speaker-dependent extension to the Resource Management (RM1) corpus. The corpus consists of a total of 10,508 sentence utterances (two male and two female speakers each speaking 2,652 sentence texts). These include the 600 "standard" Resource Management speaker-dependent training sentences, two dialect calibration sentences, ten rapid adaptation sentences, 1,800 newly-generated extended training sentences, 120 newly-generated development-test sentences and 120 newly-generated evaluation-test sentences. The evaluation-test material was used as the test set for the June 1990 DARPA SLS Resource Management Benchmark Tests (see the Proceedings). The RM2 corpus was recorded at Texas Instruments. The NIST speech recognition scoring software originally distributed on the RM1 "Test" Disc was adapted for RM2 sentences and is included in this publication. Portions © 1993 Trustees of the University of Pennsylvania

<a href="http://catalog.ldc.upenn.edu/LDC93S3A" rel="nofollow">LDC93S3A</a> —— 资源管理完整数据集2.0版 LDC93S3B —— 资源管理（RM1）2.0版 <a href="http://catalog.ldc.upenn.edu/LDC93S3C" rel="nofollow">LDC93S3C</a> —— 资源管理（RM2）2.0版 美国国防高级研究计划局（DARPA）资源管理语料库（RM）由经数字化处理与转录的语音数据构成，用于设计与评估连续语音识别系统。该语料库包含两个核心部分，通常称为RM1与RM2。其中RM1包含三个子模块：说话人相关（SD）训练数据、说话人无关（SI）训练数据，以及测试与评估数据；RM2则新增了规模更大的SD数据集，包含测试素材。资源管理完整数据集2.0版涵盖了RM1与RM2的全部内容。 所有RM系列素材均为基于海军资源管理任务场景撰写的朗读语句。完整语料库包含来自160余名不同口音美国方言说话人的25000余条语音片段。该数据采用森海塞尔（Sennheiser）HMD-414头戴式麦克风录制，采样率为16kHz，采样精度为16比特。 资源管理说话人相关与无关训练及测试数据（RM1） 说话人相关（SD）训练数据包含12名受试说话人，每名说话人需朗读600条“训练语句”、2条“方言语句”以及10条“快速适配语句”，总计录制7344条语句语音片段。被指定为训练集的600条语句覆盖了语料库中97个词汇项。 说话人无关（SI）训练数据包含80名受试说话人，每名说话人需朗读2条方言语句，以及来自资源管理文本语料库的40条语句，总计录制3360条语句语音片段。1600条资源管理语句文本中的每条均由两名不同的受试说话人录制，且同一受试说话人不会重复朗读同一条语句。 RM1包含了1987年3月、10月，1988年6月，以及1989年2月、10月开展的5次DARPA基准测试中所用的全部SD与SI系统测试素材，同时附带了对应测试的评分与诊断软件及文档。此外还提供了相关文档，详述了卡内基梅隆大学（CMU）如何利用资源管理训练与测试素材开发SPHINX语音识别系统。本数据集还收录了1989年10月基准测试中，当前主流的SD与SI系统（即BBN BYBLOS系统及CMU SPHINX系统）的输出示例与评分结果。 扩展型资源管理说话人相关语料库（RM2） 该数据集是对资源管理（RM1）语料库的说话人相关扩展。RM2语料库总计包含10508条语句语音片段，由2名男性与2名女性说话人各朗读2652条语句文本构成。其内容包括600条“标准”资源管理说话人相关训练语句、2条方言校准语句、10条快速适配语句、1800条新增扩展训练语句、120条新增开发测试语句以及120条新增评估测试语句。其中评估测试素材被用作1990年6月DARPA SLS资源管理基准测试的测试集（详见会议论文集）。 RM2语料库由德州仪器（Texas Instruments）录制。原本随RM1“测试光盘”分发的美国国家标准与技术研究院（NIST）语音识别评分软件已针对RM2语句完成适配，并随本数据集一同提供。 部分内容 © 1993 宾夕法尼亚大学托管委员会

提供机构：

Linguistic Data Consortium

创建时间：

2020-11-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集