Human Acceptability Corpus for Code-switching (HAC)

Name: Human Acceptability Corpus for Code-switching (HAC)
Creator: 纽约大学阿布扎比分校
Published: 2022-11-22 16:14:07
License: 暂无描述

arXiv2022-11-22 更新2024-06-21 收录

下载链接：

http://arzen.camel-lab.com/

下载链接

链接失效反馈

官方服务：

资源简介：

Human Acceptability Corpus for Code-switching (HAC) 是由纽约大学阿布扎比分校等机构开发的数据集，用于评估代码切换自动语音识别的性能。该数据集包含1301条从ArzEn埃及阿拉伯语-英语代码切换对话语料库中提取的语音数据，通过不同的自动语音识别系统生成假设。HAC数据集旨在通过人类判断来量化自动语音识别假设的最小编辑量，解决代码切换中的评估指标问题，并探索语音识别在多语言环境中的应用，特别是在阿拉伯语和英语之间的代码切换场景。

The Human Acceptability Corpus for Code-switching (HAC) is a dataset developed by New York University Abu Dhabi and other institutions, designed to evaluate the performance of code-switching automatic speech recognition (ASR) systems. This dataset includes 1,301 speech segments extracted from the ArzEn Egyptian Arabic-English code-switching conversational corpus, with hypotheses generated by multiple automatic speech recognition systems. The HAC dataset aims to quantify the minimum edit distance of ASR hypotheses via human judgment, address the challenges of evaluation metrics for code-switching scenarios, and explore the application of speech recognition in multilingual environments, particularly in Arabic-English code-switching contexts.

提供机构：

纽约大学阿布扎比分校

创建时间：

2022-11-22

搜集汇总

数据集介绍

背景与挑战

背景概述

Human Acceptability Corpus for Code-switching (HAC) 是由纽约大学阿布扎比分校等机构开发的数据集，专门用于评估代码切换自动语音识别性能。它包含1301条从埃及阿拉伯语-英语代码切换对话中提取的语音数据，通过人类判断量化自动语音识别假设的编辑量，旨在解决代码切换中的评估挑战，并探索多语言环境下的语音识别应用。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集