sarvamai/contextual_asr_benchmark
收藏Hugging Face2026-02-02 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/sarvamai/contextual_asr_benchmark
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个**合成上下文自动语音识别(ASR)**基准,旨在评估和改进语音机器人在语音识别系统中的性能,特别关注**上下文感知转录**,即ASR模型可以利用对话历史和代理提示来更好地转录用户响应。数据集覆盖了**10种主要印度语言**,为测试真实世界对话场景中的语音AI能力提供了多样化的语言环境。
### 支持的语言
数据集包含以下10种主要印度语言的样本:
1. **印地语** (hi)
2. **孟加拉语** (bn)
3. **马拉地语** (mr)
4. **泰卢固语** (te)
5. **泰米尔语** (ta)
6. **古吉拉特语** (gu)
7. **卡纳达语** (kn)
8. **马拉雅拉姆语** (ml)
9. **奥里亚语** (or)
10. **旁遮普语** (pa)
### 数据集结构
#### 数据实例
每个数据实例代表语音机器人交互中的一个回合。`context`字段提供了必要的背景信息(机器人角色、历史和即时问题),以帮助模型从`audio`中预测`text`(转录)。
#### 数据字段
* **`audio`**: 用户语音响应的音频文件或数据。
* **`text`**: 用户语音响应的真实转录文本。
* **`language`**: 音频的语言。
* **`context`**: 包含输入场景信息的文本字符串,包括:
* **机器人描述:** 机器人的角色(例如“银行助手”)。
* **之前的对话历史:** 对话中的先前回合。
* **机器人提出的问题:** 引发用户响应的具体查询。
### 使用案例
该数据集专门设计用于:
* **上下文偏置:** 训练ASR模型基于`context`提高预期词(例如数字、日期、实体)的概率。
* **意图识别:** 评估转录在嘈杂场景中是否正确捕捉用户意图。
* **对话状态跟踪:** 测试端到端口语理解(SLU)系统。
### 数据集创建
* **来源:** 合成生成。
* **方法:** 模拟语音机器人场景以覆盖多个领域(银行、电子商务、医疗保健)。用户响应被合成或录制以匹配`context`中的特定提示。
This dataset is a **Synthetic Contextual Automatic Speech Recognition (ASR)** benchmark designed to evaluate and improve speech recognition systems in voice bot scenarios. It focuses on **context-aware transcription**, where the ASR model can leverage conversation history and agent prompts to better transcribe user responses.
The dataset covers the **top 10 Indian languages**, providing a diverse linguistic landscape for testing voice AI capabilities in real-world conversational settings.
### Supported Languages
The dataset includes samples for the following 10 major Indian languages:
1. **Hindi** (hi)
2. **Bengali** (bn)
3. **Marathi** (mr)
4. **Telugu** (te)
5. **Tamil** (ta)
6. **Gujarati** (gu)
7. **Kannada** (kn)
8. **Malayalam** (ml)
9. **Odia** (or)
10. **Punjabi** (pa)
### Dataset Structure
#### Data Instances
Each data instance represents a single turn in a voice bot interaction. The `context` field provides the necessary background (bot persona, history, and the immediate question) to help the model predict the `text` (transcription) from the `audio`.
#### Data Fields
* **`audio`**: The audio file or data of the users spoken response.
* **`text`**: The ground truth transcription of the users spoken response.
* **`language`**: The language of the audio.
* **`context`**: A text string containing the input scenario information, comprising:
* **Bot Description:** The persona of the bot (e.g., "Banking Assistant").
* **Previous Conversation History:** Previous turns in the dialogue.
* **Question asked by the bot:** The specific query prompting the users response.
### Use Cases
This dataset is specifically designed for:
* **Contextual Biasing:** Training ASR models to boost probabilities for expected words (e.g., numbers, dates, entities) based on the `context`.
* **Intent Recognition:** Evaluating if the transcription captures the users intent correctly in noisy scenarios.
* **Dialog State Tracking:** Testing end-to-end spoken language understanding (SLU) systems.
### Dataset Creation
* **Source:** Synthetic generation.
* **Methodology:** Voice bot scenarios were simulated to cover various domains (Banking, E-commerce, Healthcare). User responses were synthesized or recorded to match the specific prompt found in the `context`.
提供机构:
sarvamai



