five

2006 NIST Speaker Recognition Evaluation Training Set

收藏
Mendeley Data2024-01-31 更新2024-06-28 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2011S09
下载链接
链接失效反馈
官方服务:
资源简介:
Introduction 2006 NIST Speaker Recognition Evaluation Training Set was developed by LDC and NIST (National Institute of Standards and Technology). It contains 595 hours of conversational telephone speech in English, Arabic, Bengali, Chinese, Hindi, Korean, Russian, Thai and Urdu and associated English transcripts used as training data in the NIST-sponsored 2006 Speaker Recognition Evaluation (SRE). The ongoing series of SRE yearly evaluations conducted by NIST are intended to be of interest to researchers working on the general problem of text independent speaker recognition. To this end the evaluations are designed to be simple, to focus on core technology issues, to be fully supported and to be accessible to those wishing to participate. The task of the 2006 SRE evaluation was speaker detection, that is, to determine whether a specified speaker is speaking during a given segment of conversational telephone speech. The task was divided into 15 distinct and separate tests involving one of five training conditions and one of four test conditions. Further information about the test conditions and additional documentation is available in the 2006 SRE Evaluation Plan. Data The speech data in this release was collected by LDC as part of the Mixer project, in particular Mixer Phases 1, 2 and 3. The Mixer project supports the development of robust speaker recognition technology by providing carefully collected and audited speech from a large pool of speakers recorded simultaneously across numerous microphones and in different communicative situations and/or in multiple languages. The data is mostly English speech, but includes some speech in Arabic, Bengali, Chinese, Hindi, Korean, Russian, Thai and Urdu. The telephone speech segments are multi-channel data collected simultaneously from a number of auxiliary microphones. The files are organized into three types: two-channel excerpts of approximately 10 seconds, two-channel conversations of approximately 5 minutes and summed-channel conversations also of approximately 5 minutes. The speech files are stored as 8-bit u-law speech signals in separate SPHERE files. In addition to the standard header fields, the SPHERE header for each file contains some auxiliary information that includes the language of the conversation and whether the data was recorded over a telephone line. English language transcripts in .ctm format were produced using an automatic speech recognition (ASR) system. Samples For an example of the data contained in this corpus, review this audio sample. Updates None at this time. Portions © 2004-2006, 2011 Trustees of the University of Pennsylvania

2006年NIST说话人识别评测训练集(2006 NIST Speaker Recognition Evaluation Training Set)由语言数据联盟(Linguistic Data Consortium, LDC)与美国国家标准与技术研究院(National Institute of Standards and Technology, NIST)联合开发。该数据集包含595小时会话电话语音,覆盖英语、阿拉伯语、孟加拉语、汉语、印地语、韩语、俄语、泰语及乌尔都语共9种语言,同时附带配套的英文转写文本,作为NIST赞助的2006年说话人识别评测(Speaker Recognition Evaluation, SRE)的训练数据使用。 NIST每年举办的系列SRE评测,旨在面向从事文本无关说话人识别通用问题研究的科研人员。为此,该系列评测设计简洁聚焦、专注核心技术难点、提供完整技术支持,便于有意参与的科研人员开展相关研究。 2006年SRE评测的任务为说话人检测,即判定指定说话人是否在给定的会话电话语音片段中发声。该任务被划分为15项独立测试,涵盖5种训练场景与4种测试场景的组合。更多关于测试场景及补充文档的信息,可参阅《2006 SRE评测方案》(2006 SRE Evaluation Plan)。 数据 本批次发布的语音数据由LDC作为Mixer项目(具体为Mixer项目第1、2、3阶段)的一部分采集。Mixer项目通过采集大量经严格审核的多麦克风同步录制、多通信场景及多语言的说话人语音数据,为鲁棒性说话人识别技术的发展提供支撑。 该数据集以英语语音为主,同时包含少量阿拉伯语、孟加拉语、汉语、印地语、韩语、俄语、泰语及乌尔都语语音。电话语音片段为多通道数据,由多支辅助麦克风同步采集。 数据文件分为三类:约10秒的双声道语音片段、约5分钟的双声道会话语音,以及约5分钟的单声道求和会话语音。语音文件以8位μ律(u-law)语音信号格式存储于独立的SPHERE文件中。除标准头部字段外,每个文件的SPHERE头部还包含部分辅助信息,包括会话所用语言及数据是否通过电话线录制。采用.ctm格式的英文转写文本由自动语音识别(Automatic Speech Recognition, ASR)系统生成。 样本示例 若需查看该语料库中的数据示例,可查阅此音频样本。 更新说明 暂无更新。 部分内容© 2004-2006、2011 宾夕法尼亚大学托管委员会。
创建时间:
2024-01-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作