ContextASR-Bench

Name: ContextASR-Bench
Creator: maas
Published: 2026-05-13 20:18:00
License: 暂无描述

魔搭社区2026-05-13 更新2025-07-12 收录

下载链接：

https://modelscope.cn/datasets/AI-ModelScope/ContextASR-Bench

下载链接

链接失效反馈

官方服务：

资源简介：

# ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark <p align="center" dir="auto"> <a href="https://arxiv.org/abs/2507.05727" rel="nofollow"><img src="https://img.shields.io/badge/ArXiv-2507.05727-red" style="max-width: 100%;"></a> <a href="https://github.com/MrSupW/ContextASR-Bench" rel="nofollow"><img src="https://img.shields.io/badge/Github-MrSupW-black" style="max-width: 100%;"></a> </p> Automatic Speech Recognition (ASR) has been extensively investigated, yet prior benchmarks have largely focused on assessing the acoustic robustness of ASR models, leaving evaluations of their linguistic capabilities relatively underexplored. This largely stems from the limited parameter sizes and training corpora of conventional ASR models, leaving them with insufficient world knowledge, which is crucial for accurately recognizing named entities across diverse domains. For instance, drug and treatment names in medicine or specialized technical terms in engineering. Recent breakthroughs in Large Language Models (LLMs) and corresponding Large Audio Language Models (LALMs) have markedly enhanced the visibility of advanced context modeling and general artificial intelligence capabilities. Leveraging LLMs, we envision a unified system capable of robust speech recognition across diverse real-world domains, yet existing benchmarks are inadequate for evaluating this objective. To address this gap, we propose ContextASR-Bench: a comprehensive, large-scale benchmark designed to assess the linguistic competence of ASR systems using corpora that feature numerous named entities across multiple domains. It encompasses up to 40,000 data entries with more than 300,000 named entities across over 10 domains. Beyond the audio and its transcription, each sample provides the domain it belongs to and a list of named entities it contains, which are referred to as the context. Based on this, we introduce three evaluation modes to assess how effectively models can exploit such context to improve ASR accuracy. Extensive evaluation on ContextASR-Bench highlights that LALMs outperform conventional ASR models by a large margin thanks to the strong world knowledge and context modeling of LLMs, yet there remains ample room for further improvement. ## 🖥️ Overview of ContextASR-Bench The below picture is an overview of our proposed ContextASR-Bench, comprising ContextASR-Speech and ContextASR-Dialogue, two distinct test sets. The left part shows the data pipeline for these two test sets. Both use DeepSeek-R1 to generate entity-rich corpora, which are then synthesized into speech using Zero-Shot TTS. Each entry in both sets follows the same data structure: <**Audio**, **Text**, **Coarse-grained Context**, **Fine-grained Context**>. The middle part presents three contextual evaluation settings. The contextless setting can be used for evaluating any ASR systems, while the other two assess LALMs' context comprehension capacity through different granularity information within the prompt. The right part introduces three evaluation metrics used in ContextASR-Bench. **NE-WER** and **NE-FNR** focus more on the accuracy of entity recognition compared to **WER**. <div style="text-align: center;"> <img src="./figure/ContextASR-Bench_MainFigure.png" style="width:95%; max-width:100%;"/><br/> </div> ## 🗂️ Download ContextASR-Bench Data The ContextASR-Bench dataset is now available for download here. ## 📑 Evaluation Code Please refer to the [GitHub repository](https://github.com/MrSupW/ContextASR-Bench). ## 📚 Citation ``` @article{wang2025asrbench, title={ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark}, author={He Wang and Linhan Ma and Dake Guo and Xiong Wang and Lei Xie and Jin Xu and Junyang Lin}, year={2025}, eprint={2507.05727}, archivePrefix={arXiv}, primaryClass={eess.AS}, url={https://arxiv.org/abs/2507.05727}, } ```

# ContextASR-Bench：大规模上下文语音识别基准数据集 <p align="center" dir="auto"> <a href="https://arxiv.org/abs/2507.05727" rel="nofollow"><img src="https://img.shields.io/badge/arXiv预印本-2507.05727-red" style="max-width: 100%;"></a> <a href="https://github.com/MrSupW/ContextASR-Bench" rel="nofollow"><img src="https://img.shields.io/badge/GitHub仓库-MrSupW-black" style="max-width: 100%;"></a> </p> 自动语音识别（Automatic Speech Recognition，ASR）领域已得到广泛研究，但现有基准测试大多聚焦于评估ASR模型的声学鲁棒性，对其语言能力的评估则相对不足。这一问题主要源于传统ASR模型的参数量与训练语料规模受限，导致其缺乏足够的世界知识——而准确识别跨领域命名实体恰恰依赖这类知识。例如医学领域的药品与治疗方案名称，或是工程领域的专业技术术语。近年来，大语言模型（Large Language Models，LLMs）以及与之配套的大音频语言模型（Large Audio Language Models，LALMs）取得突破性进展，显著提升了高级上下文建模与通用人工智能能力的关注度。借助大语言模型的能力，我们期望构建一套可在多样真实场景中实现鲁棒语音识别的统一系统，但现有基准测试无法有效评估这一目标。为填补这一空白，我们提出ContextASR-Bench：一款全面且大规模的基准数据集，旨在通过涵盖多领域大量命名实体的语料，评估ASR系统的语言理解能力。该数据集包含超过10个领域的近4万条数据条目，涵盖超30万个命名实体。除音频文件与对应转录文本外，每条样本还会标注其所属领域以及包含的命名实体列表，这类辅助信息即被称为上下文。基于此，我们设计了三种评估模式，用以衡量模型利用此类上下文提升ASR识别精度的能力。在ContextASR-Bench上的大量评估结果显示，得益于大语言模型强大的世界知识与上下文建模能力，大音频语言模型的表现远优于传统ASR模型，但仍存在大量可优化的空间。 ## 🖥️ ContextASR-Bench 概览下图为我们提出的ContextASR-Bench整体架构，该数据集包含两个独立的测试子集：ContextASR-Speech与ContextASR-Dialogue。左侧部分展示了这两个测试子集的数据处理流程：二者均使用DeepSeek-R1生成富含命名实体的语料，随后通过零样本文本转语音（Zero-Shot TTS）技术将其合成为音频。两个子集的每条数据均遵循统一的数据结构：<**Audio**, **Text**, **Coarse-grained Context**, **Fine-grained Context**>，即<**音频文件**，**转录文本**，**粗粒度上下文**，**细粒度上下文**>。中间部分展示了三种上下文评估设置：无上下文设置可用于评估任意ASR系统，其余两种设置则通过提示词中不同粒度的上下文信息，评估大音频语言模型的上下文理解能力。右侧部分介绍了ContextASR-Bench使用的三种评估指标：相较于词错误率（Word Error Rate，WER），命名实体词错误率（NE-WER）与命名实体假阴性率（NE-FNR）更聚焦于命名实体识别的准确性。 <div style="text-align: center;"> <img src="./figure/ContextASR-Bench_MainFigure.png" style="width:95%; max-width:100%;"/><br/> </div> ## 🗂️ 下载ContextASR-Bench数据集 ContextASR-Bench数据集现已可通过对应渠道下载。 ## 📑 评估代码请参阅[GitHub仓库](https://github.com/MrSupW/ContextASR-Bench)获取相关评估代码。 ## 📚 引用格式 @article{wang2025asrbench, title={ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark}, author={He Wang and Linhan Ma and Dake Guo and Xiong Wang and Lei Xie and Jin Xu and Junyang Lin}, year={2025}, eprint={2507.05727}, archivePrefix={arXiv}, primaryClass={eess.AS}, url={https://arxiv.org/abs/2507.05727}, }

提供机构：

maas

创建时间：

2025-07-10

5,000+

优质数据集

54 个

任务类型

进入经典数据集