2022 NIST Language Recognition Evaluation Test and Development Sets
收藏DataCite Commons2026-04-15 更新2026-05-06 收录
下载链接:
https://datasets.lib.berkeley.edu/citation?persistentId=doi:10.60503/D3/CQIMYN
下载链接
链接失效反馈官方服务:
资源简介:
2022 NIST Language Recognition Evaluation Test and Development Sets was developed by the Linguistic Data Consortium (LDC) and the National Institute of Standards and Technology (NIST). This release contains the test and development data, metadata, answer keys, and documentation for the 2022 NIST Language Recognition Evaluation (LRE22). The source speech data is comprised of approximately 222 hours of conversational telephone speech (CTS) and broadcast narrowband speech (BNBS) in 14 languages: Afrikaans, Tunisian Arabic, Algerian Arabic, Libyan Arabic, South African English, Indian-accented South African English, North African French, Ndebele, Oromo, Tigrinya, Tsonga, Venda, Xhosa and Zulu.
The goals of NIST's Language Recognition Evaluation are to advance language recognition technologies, to facilitate technology development, and to measure the performance of current state-of-the-art technology. LRE22 emphasized language recognition for African languages, including low resource languages, and expanded the range of test segment durations. Further information about the 2022 evaluation can be found in the 2022 NIST Language Recognition Evaluation Plan.
2022年NIST语言识别评测测试与开发集由语言数据联盟(Linguistic Data Consortium,LDC)与美国国家标准与技术研究院(National Institute of Standards and Technology,NIST)联合开发。本次发布内容涵盖2022年NIST语言识别评测(LRE22)所需的测试与开发数据、元数据、标准答案集以及官方文档。该数据集的源语音数据总计约222小时,包含会话电话语音(conversational telephone speech,CTS)与广播窄带语音(broadcast narrowband speech,BNBS)两类,覆盖14种语言:南非荷兰语、突尼斯阿拉伯语、阿尔及利亚阿拉伯语、利比亚阿拉伯语、南非英语、印度口音南非英语、北非法语、恩德贝莱语、奥罗莫语、提格雷尼亚语、聪加语、文达语、科萨语与祖鲁语。
NIST语言识别评测的目标在于推动语言识别技术迭代升级、促进相关技术研发,并对当前最前沿技术的性能表现进行量化测评。本次LRE22评测重点关注非洲语言识别任务,其中包含诸多低资源语言,同时拓展了测试片段的时长覆盖范围。有关本次2022年评测的更多详情,可查阅《2022年NIST语言识别评测方案》。
提供机构:
UC Berkeley Library Dataverse
创建时间:
2026-04-15



