ML-SUPERB
收藏arXiv2023-08-12 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2305.10615v2
下载链接
链接失效反馈官方服务:
资源简介:
ML-SUPERB是一个多语言语音性能基准数据集,由卡内基梅隆大学等机构创建,涵盖143种语言,包括高资源至濒危语言。该数据集主要用于评估自监督学习模型在自动语音识别和语言识别任务上的表现。数据集内容丰富,包含多种语言的语音数据,支持多种研究场景,如单语和多语语音识别。创建过程中,数据集从多个多语言语音语料库中收集,确保了数据的多样性和广泛性。ML-SUPERB的应用领域广泛,旨在解决多语言环境下语音处理技术的性能评估和模型优化问题。
ML-SUPERB is a multilingual speech performance benchmark dataset created by institutions including Carnegie Mellon University. It covers 143 languages ranging from high-resource to endangered ones. The dataset is primarily used to evaluate the performance of self-supervised learning models on automatic speech recognition (ASR) and language identification tasks. It contains rich speech data across diverse languages, supporting multiple research scenarios such as monolingual and multilingual speech recognition. During its development, the dataset was collected from multiple multilingual speech corpora, ensuring the diversity and breadth of the data. ML-SUPERB has a wide range of application scenarios, aiming to address the performance evaluation and model optimization of speech processing technologies in multilingual environments.
提供机构:
卡内基梅隆大学
创建时间:
2023-05-18
搜集汇总
数据集介绍

背景与挑战
背景概述
ML-SUPERB是一个多语言语音性能基准数据集,涵盖143种语言,用于评估自监督学习模型在语音识别和语言识别任务上的表现。数据集内容丰富,支持多种研究场景,旨在解决多语言环境下语音处理技术的性能评估和模型优化问题。
以上内容由遇见数据集搜集并总结生成



