Word Error Rate (WER) under noisy conditions.
收藏Figshare2026-01-12 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/_p_Word_Error_Rate_WER_under_noisy_conditions_p_/31051604
下载链接
链接失效反馈官方服务:
资源简介:
Spoken Question Answering (SQA) extends machine reading comprehension to spoken content and requires models to handle both automatic speech recognition (ASR) errors and downstream language understanding. Although large-scale SQA benchmarks exist for high-resource languages, Vietnamese remains underexplored due to the lack of standardized datasets. This paper introduces ViSQA, the first benchmark for Vietnamese Spoken Question Answering. ViSQA extends the UIT-ViQuAD corpus using a reproducible text-to-speech and ASR pipeline, resulting in over 13,000 question–answer pairs aligned with spoken inputs. The dataset includes clean and noise-degraded audio variants to enable systematic evaluation under varying transcription quality. Experiments with five transformer-based models show that ASR errors substantially degrade performance (e.g., ViT5 EM: 62.04% 36.30%), while training on spoken transcriptions improves robustness (ViT5 EM: 36.30% 50.70%). ViSQA provides a rigorous benchmark for evaluating Vietnamese SQA systems and enables systematic analysis of the impact of ASR errors on downstream reasoning.
创建时间:
2026-01-12



