krutrim-ai-labs/VoiceAgentBench
收藏Hugging Face2026-02-16 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/krutrim-ai-labs/VoiceAgentBench
下载链接
链接失效反馈官方服务:
资源简介:
---
license: other
license_name: krutrim-community-license-agreement-version-1.0
license_link: LICENSE.md
pretty_name: VoiceAgentBench
language:
- en
- hi
- bn
- ta
- te
- ml
- mr
tags:
- audio
- speech
- tool-calling
- function-calling
- benchmark
---
# VoiceAgentBench
This repository contains dataset for **VoiceAgentBench**, a large-scale speech benchmark introduced in **“VoiceAgentBench: Are Voice Assistants Ready for Agentic Tasks?”** ([arXiv:2510.07978](https://arxiv.org/abs/2510.07978)).
VoiceAgentBench is designed to evaluate **end-to-end speech-based agents** in realistic, tool-driven settings. Unlike prior speech benchmarks that focus on transcription, intent detection, and speech question answering, this benchmark targets **agentic reasoning from speech input**, requiring models to select appropriate tools, generate structured arguments, orchestrate multi-step workflows, and handle safety-critical requests.
The dataset consists of **multilingual spoken queries** paired with explicit tool/function specifications and expected tool-call outputs, covering **single- and multi-tool usage**, **sequentially dependent and parallel tool orchestration**, **multi-turn spoken dialogues**, and **unsafe user requests requiring correct refusal behavior**. VoiceAgentBench enables systematic evaluation of both **ASR–LLM pipelines** and **end-to-end SpeechLMs**, highlighting the gap between text-based agents and their speech-based counterparts.
### Repository layout
All benchmark assets live under the top-level `VoiceAgentBench/` directory:
- `VoiceAgentBench/*_data/**.json`: queries / audio paths / instructions/ expected tool calls
- `VoiceAgentBench/*_audios/**.wav`: corresponding audio files
In each JSON file, the `path` field is **repo-relative** (e.g. `VoiceAgentBench/single_tool_audios/english/1_audio.wav`).
### Subsets
- **single_tool:** Single tool-call tasks involving simple parameter filling from a spoken query, given a predefined tool.
- **single_tool_retrieval:** Tasks requiring selection of the relevant tool from a tool list, followed by parameter filling based on the spoken query.
- **parallel_tool:** Tasks that require selecting and invoking multiple independent tools in parallel from a provided tool list.
- **seqdep_tool:** Tasks involving chained, sequential tool invocations selected from a tool list.
- **multi_turn:** Dialog-based tool invocation tasks, where a single tool call must be produced based on information accumulated over multiple spoken interaction turns.
- **safety:** Safety evaluation tasks that involve rejecting adversarial or unsafe spoken queries and avoiding unsafe or hallucinated tool invocations.
### Data format (common patterns)
Depending on the subset, each item may include:
- `id`: example id
- `query` / `user_request`: the text query
- `functions`: tool/function specs (or list of tool names in safety)
- `expected_tool_call`: expected tool invocation(s) and arguments (when applicable)
- `path`: relative audio path (wav)
- `duration`: duration in seconds
- `instruction`: system prompt template
- `chat_history`: (multi_turn only) list of turns; user turns include `path` and `duration`
### Using the data
You can read JSON directly, or use `datasets`. Example:
```python
from datasets import load_dataset, Audio
from huggingface_hub import hf_hub_download
# 1) Download JSON
json_path = hf_hub_download(
repo_id="krutrim-ai-labs/VoiceAgentBench",
repo_type="dataset",
filename="data/single_tool_data/english/single_tool_english.json", # showing this as an example for single tool calling
)
ds = load_dataset("json", data_files=json_path, split="train")
# 2) Download each audio and replace path
def download_and_replace(example):
local_path = hf_hub_download(
repo_id="krutrim-ai-labs/VoiceAgentBench",
repo_type="dataset",
filename=example["path"],
)
example["path"] = local_path
return example
ds = ds.map(download_and_replace)
# 3) Cast column
ds = ds.cast_column("path", Audio())
# 4) Test
print(ds[0]["path"])
```
## Code & Evaluation
The official inference and evaluation codebase for **VoiceAgentBench** is available on GitHub:
**GitHub Repository:**
[https://github.com/ola-krutrim/VoiceAgentBench](https://github.com/ola-krutrim/VoiceAgentBench)
The repository includes:
* End-to-end inference pipelines for SpeechLMs and ASR–LLM systems
* Structured tool-call parsing and normalization
* LLM-as-a-judge evaluation for:
* Parameter correctness
* Multi-tool orchestration
* Sequential dependencies
* Multi-turn reasoning
* Safety & refusal behavior
* Reproducible evaluation scripts across all benchmark subsets
* Modular interface enabling easy integration of new SpeechLMs
### License
This repository is licensed under the Krutrim Community License.
### Citation
If you use this dataset, please cite:
```bibtex
@article{jain2025voiceagentbench,
title={VoiceAgentBench: Are Voice Assistants ready for agentic tasks?},
author={Dhruv Jain and Harshit Shukla and Gautam Rajeev and Ashish Kulkarni and Chandra Khatri and Shubham Agarwal},
year={2025},
eprint={2510.07978},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2510.07978},
}
```
提供机构:
krutrim-ai-labs



