Interspeech 2016 - Experiment results for Sheffield Wargame Corpora (SWC1, SWC2, SWC3)
收藏orda.shef.ac.uk2023-05-30 更新2025-03-24 收录
下载链接:
https://orda.shef.ac.uk/articles/dataset/Interspeech_2016_-_Experiment_results_for_Sheffield_Wargame_Corpora_SWC1_SWC2_SWC3_/3119743/1
下载链接
链接失效反馈官方服务:
资源简介:
The files in the dataset correspond to results that have been generated for Interspeech 2016 paper: "The Sheffield Wargames Corpus - Day Two and Day Three" (DOI: 10.21437/Interspeech.2016-98). This paper details a natural English speech corpora recorded in natural environment with multi-media and multi-microphones, reports baseline speech recognition performance based on standalone training and adaptation, and it also releases a Kaldi recipe for standalone training.
The files in the zip file are of three types:
- .ctm, which correspond to the output of the automatic speech recognition system and the columns include segment information as well as transcripts of the recognition.
- .ctm.filt.sys, which correspond to scoring of the automatic speech recognition system and includes the overall word error rate as well as the number of insertions, deletions and substitutions of the overall system.
- .ctm.filt.lur, which provides a more detailed decomposition of the word error rate across multiple genres.
The three file types are repeated for all the results described in Table 4 and Table 5 of the paper.
The following is a description about the naming convention of the files (already explained in the paper):
"ihm" refers to "Individual Headset Microphone".
"sdm" refers to "Single Distant Microphone".
"mdm8" refers to "Multiple Distant Microphone - 8 channels".
"LDA" refers to "Linear Discriminant Analysis".
"MLLT" refers to "Maximum Likelihood Linear Transform".
"SAT" refers to "Speaker Adaptive Training".
"MMI" refers to "Maximum Mutual Information".
"DNN" refers to "Deep Neural Network".
"sMBR" refers to "state-level Minimum Bayes Risk".
"fMLLR" refers to "feature-level Maximum Likelihood Linear Regression".
"o4" refers to "maximally 4 overlapping speakers in scoring".
All three file types are standard outputs that are recognized by the automatic speech recognition community and can be opened using any text editor.
本数据集中的文件对应于 Interspeech 2016 会议论文《The Sheffield Wargames Corpus - Day Two and Day Three》(DOI: 10.21437/Interspeech.2016-98)所生成的结果。该论文详细介绍了在自然环境中,采用多媒体和多麦克风录制的大规模自然英语语音语料库,并报告了基于独立训练和自适应的基线语音识别性能,同时发布了用于独立训练的 Kaldi 脚本。数据集中的文件分为三种类型:
- .ctm 文件,对应自动语音识别系统的输出,其列包含段落信息和识别文本。
- .ctm.filt.sys 文件,对应自动语音识别系统的评分,包括整体词错误率以及系统的插入、删除和替换数量。
- .ctm.filt.lur 文件,提供了对词错误率的多类型详细分解。
上述三种文件类型均重复出现在论文的表 4 和表 5 中所述的所有结果中。
以下是对文件命名约定的描述(已在论文中解释):
- "ihm" 指代 "Individual Headset Microphone"(个人耳机麦克风)。
- "sdm" 指代 "Single Distant Microphone"(单个远端麦克风)。
- "mdm8" 指代 "Multiple Distant Microphone - 8 channels"(多通道远端麦克风)。
- "LDA" 指代 "Linear Discriminant Analysis"(线性判别分析)。
- "MLLT" 指代 "Maximum Likelihood Linear Transform"(最大似然线性变换)。
- "SAT" 指代 "Speaker Adaptive Training"(说话人自适应训练)。
- "MMI" 指代 "Maximum Mutual Information"(最大互信息)。
- "DNN" 指代 "Deep Neural Network"(深度神经网络)。
- "sMBR" 指代 "state-level Minimum Bayes Risk"(状态级最小贝叶斯风险)。
- "fMLLR" 指代 "feature-level Maximum Likelihood Linear Regression"(特征级最大似然线性回归)。
- "o4" 指代 "scoring 中最多有 4 个重叠说话人"。
所有三种文件类型均为自动语音识别社区认可的标准化输出,可使用任何文本编辑器打开。
提供机构:
orda.shef.ac.uk



