sarvamai/contextual_asr_benchmark

Name: sarvamai/contextual_asr_benchmark
Creator: sarvamai
Published: 2026-02-02 20:29:49
License: 暂无描述

Hugging Face2026-02-02 更新2026-02-07 收录

下载链接：

https://hf-mirror.com/datasets/sarvamai/contextual_asr_benchmark

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个**合成上下文自动语音识别（ASR）**基准，旨在评估和改进语音机器人在语音识别系统中的性能，特别关注**上下文感知转录**，即ASR模型可以利用对话历史和代理提示来更好地转录用户响应。数据集覆盖了**10种主要印度语言**，为测试真实世界对话场景中的语音AI能力提供了多样化的语言环境。 ### 支持的语言数据集包含以下10种主要印度语言的样本： 1. **印地语** (hi) 2. **孟加拉语** (bn) 3. **马拉地语** (mr) 4. **泰卢固语** (te) 5. **泰米尔语** (ta) 6. **古吉拉特语** (gu) 7. **卡纳达语** (kn) 8. **马拉雅拉姆语** (ml) 9. **奥里亚语** (or) 10. **旁遮普语** (pa) ### 数据集结构 #### 数据实例每个数据实例代表语音机器人交互中的一个回合。`context`字段提供了必要的背景信息（机器人角色、历史和即时问题），以帮助模型从`audio`中预测`text`（转录）。 #### 数据字段 * **`audio`**: 用户语音响应的音频文件或数据。 * **`text`**: 用户语音响应的真实转录文本。 * **`language`**: 音频的语言。 * **`context`**: 包含输入场景信息的文本字符串，包括： * **机器人描述:** 机器人的角色（例如“银行助手”）。 * **之前的对话历史:** 对话中的先前回合。 * **机器人提出的问题:** 引发用户响应的具体查询。 ### 使用案例该数据集专门设计用于： * **上下文偏置:** 训练ASR模型基于`context`提高预期词（例如数字、日期、实体）的概率。 * **意图识别:** 评估转录在嘈杂场景中是否正确捕捉用户意图。 * **对话状态跟踪:** 测试端到端口语理解（SLU）系统。 ### 数据集创建 * **来源:** 合成生成。 * **方法:** 模拟语音机器人场景以覆盖多个领域（银行、电子商务、医疗保健）。用户响应被合成或录制以匹配`context`中的特定提示。

This dataset is a **Synthetic Contextual Automatic Speech Recognition (ASR)** benchmark designed to evaluate and improve speech recognition systems in voice bot scenarios. It focuses on **context-aware transcription**, where the ASR model can leverage conversation history and agent prompts to better transcribe user responses. The dataset covers the **top 10 Indian languages**, providing a diverse linguistic landscape for testing voice AI capabilities in real-world conversational settings. ### Supported Languages The dataset includes samples for the following 10 major Indian languages: 1. **Hindi** (hi) 2. **Bengali** (bn) 3. **Marathi** (mr) 4. **Telugu** (te) 5. **Tamil** (ta) 6. **Gujarati** (gu) 7. **Kannada** (kn) 8. **Malayalam** (ml) 9. **Odia** (or) 10. **Punjabi** (pa) ### Dataset Structure #### Data Instances Each data instance represents a single turn in a voice bot interaction. The `context` field provides the necessary background (bot persona, history, and the immediate question) to help the model predict the `text` (transcription) from the `audio`. #### Data Fields * **`audio`**: The audio file or data of the users spoken response. * **`text`**: The ground truth transcription of the users spoken response. * **`language`**: The language of the audio. * **`context`**: A text string containing the input scenario information, comprising: * **Bot Description:** The persona of the bot (e.g., "Banking Assistant"). * **Previous Conversation History:** Previous turns in the dialogue. * **Question asked by the bot:** The specific query prompting the users response. ### Use Cases This dataset is specifically designed for: * **Contextual Biasing:** Training ASR models to boost probabilities for expected words (e.g., numbers, dates, entities) based on the `context`. * **Intent Recognition:** Evaluating if the transcription captures the users intent correctly in noisy scenarios. * **Dialog State Tracking:** Testing end-to-end spoken language understanding (SLU) systems. ### Dataset Creation * **Source:** Synthetic generation. * **Methodology:** Voice bot scenarios were simulated to cover various domains (Banking, E-commerce, Healthcare). User responses were synthesized or recorded to match the specific prompt found in the `context`.

提供机构：

sarvamai

5,000+

优质数据集

54 个

任务类型

进入经典数据集