2007 NIST Language Recognition Evaluation Supplemental Training Set

Name: 2007 NIST Language Recognition Evaluation Supplemental Training Set
Creator: Linguistic Data Consortium
Published: 2021-07-01 16:21:22
License: 暂无描述

DataCite Commons2021-07-01 更新2025-04-16 收录

下载链接：

https://catalog.ldc.upenn.edu/LDC2009S05

下载链接

链接失效反馈

官方服务：

资源简介：

Introduction 2007 NIST Language Recognition Evaluation Supplemental Training Se consists of 118 hours of conversational telephone speech segments in the following languages and dialects: Arabic (Egyptian colloquial), Bengali, Min Nan Chinese, Wu Chinese, Taiwan Mandarin, Cantonese, Russian, Mexican Spanish, Thai, Urdu and Tamil. The goal of the <a href="https://www.nist.gov/itl/iad/mig/language-recognition">NIST (National Institute of Standards and Technology) Language Recognition Evaluation (LRE) </a>is to establish the baseline of current performance capability for language recognition of conversational telephone speech and to lay the groundwork for further research efforts in the field. NIST conducted three previous language recognition evaluations, in 1996, 2003 and 2005. The most significant differences between those evaluations and the 2007 task were the increased number of languages and dialects, the greater emphasis on a basic detection task for evaluation and the variety of evaluation conditions. Thus, in 2007, given a segment of speech and a language of interest to be detected (i.e., a target language), the task was to decide whether that target language was in fact spoken in the given telephone speech segment (yes or no), based on an automated analysis of the data contained in the segment. The supplemental training material in this release consists of the following: <ul> <li>Approximately 53 hours of conversational telephone speech segments in Arabic (Egyptian colloquial), Bengali, Cantonese, Min Nan Chinese,Wu Chinese, Russian, Thai and Urdu. This material is taken from LDC's CALLHOME, CALLFRIEND and Mixer collections.</li> <li>Approximately 65 hours of full telephone conversations in Mandarin Chinese (Taiwan), Spanish (Mexican) and Tamil. This material was collected by Oregon Health and Science University (OHSU), Beaverton, Oregon. The test segments used in the <a href="http://catalog.ldc.upenn.edu/LDC2008S05" rel="nofollow">2005 NIST Language Recognition Evaluation</a> were derived from these full conversations.</li> </ul> In addition to the supplemental material contained in this release, the training data for the <a href="http://catalog.ldc.upenn.edu/LDC2009S04" rel="nofollow">2007 NIST Language Recognition Evaluation</a> consisted of data from previous LRE evaluation test sets, namely, <a href="http://catalog.ldc.upenn.edu/LDC2006S31" rel="nofollow">2003 NIST Language Recognition Evaluation</a> and <a href="http://catalog.ldc.upenn.edu/LDC2008S05" rel="nofollow">2005 NIST Language Recognition Evaluation</a>. LDC released other LREs as: <ul> <li>2003 NIST Language Recognition Evaluation (<a href="../../../LDC2006S31">LDC2006S31</a>)</li> <li>2005 NIST Language Recognition Evaluation (<a href="../../../LDC2008S05">LDC2008S05</a>)</li> <li>2009 NIST Language Recognition Evaluation Test Set (<a href="../../../LDC2014S06">LDC2014S06</a>)</li> <li>2011 NIST Language Recognition Evaluation Test Set (<a href="../../../LDC2018S06">LDC2018S06</a>)</li> </ul> <h3>Samples</h3> For an example of the data in this corpus, please listen to this <a href="desc/addenda/LDC2009S05_ea.wav" rel="nofollow">sample of the Egyptian Arabic</a> data from the data set. Portions © 2005 Oregon Health and Science University, © 1996, 2006, 2009 Trustees of the University of Pennsylvania

提供机构：

Linguistic Data Consortium

创建时间：

2020-11-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集