anyreach-ai/semantic-turn-taking-benchmark
收藏Hugging Face2026-03-19 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/anyreach-ai/semantic-turn-taking-benchmark
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: apache-2.0
task_categories:
- text-classification
tags:
- turn-taking
- voice-ai
- dialog
- conversation
- end-of-utterance
- backchannel
- interruption
pretty_name: Semantic Turn-Taking Benchmark
size_categories:
- 1K<n<10K
dataset_info:
features:
- name: id
dtype: string
- name: source
dtype: string
- name: messages
list:
- name: role
dtype: string
- name: content
dtype: string
- name: conversation
dtype: string
- name: action
dtype:
class_label:
names:
'0': start_speaking
'1': continue_listening
'2': start_listening
'3': continue_speaking
- name: original_label
dtype: string
- name: num_turns
dtype: int32
splits:
- name: ten
num_examples: 428
- name: swda
num_examples: 3523
- name: synthetic
num_examples: 60
---
# Semantic Turn-Taking Benchmark
A curated evaluation benchmark for **semantic turn-taking models** in voice AI. Given a conversation context, predict what action the AI agent should take: speak, listen, continue speaking, or continue listening.
Unlike acoustic-based approaches (VAD, silence detection), this benchmark tests whether a model can make turn-taking decisions from **text/semantic content alone**.
## Action Classes
| Action | Description |
|--------|-------------|
| `start_speaking` | User finished their turn, agent should respond |
| `continue_listening` | User is mid-utterance, keep listening |
| `start_listening` | User interrupts the agent, agent should stop talking |
| `continue_speaking` | User gave a backchannel, agent keeps talking |
**`start_speaking` vs `continue_listening`** — Was the user done talking?
```
User: I need help with my bill
Agent: Sure I can help with that what seems to be the issue
User: I was charged twice for the same order
→ start_speaking (user done)
```
```
User: I need help with my bill
Agent: Sure I can help with that what seems to be the issue
User: I was charged twice for
→ continue_listening (user not done)
```
**`start_listening` vs `continue_speaking`** — User spoke while agent was talking. Interruption or backchannel?
```
User: What is your refund policy
Agent: Our refund policy states that all items purchased within the last thirty days are eligible for a full refund provided that the item is in its original packaging and
User: Okay I get
→ start_listening (real interruption, agent should stop)
```
```
User: What is your refund policy
Agent: Our refund policy states that all items purchased within the last thirty days are eligible for a full refund provided that the item is in its original packaging and
User: uh huh
→ continue_speaking (backchannel, agent keeps talking)
```
## Dataset Statistics
| Subset | start_speaking | continue_listening | start_listening | continue_speaking | Total |
|--------|---:|---:|---:|---:|---:|
| TEN | 236 | 192 | — | — | 428 |
| SwDA | 2,497 | 191 | — | 835 | 3,523 |
| Synthetic | 24 | 12 | 12 | 12 | 60 |
| **Total** | **2,757** | **395** | **12** | **847** | **4,011** |
## Subsets
### `ten` — TEN Turn-End Detection (428 examples)
Binary subset (2 classes: `start_speaking`, `continue_listening`). Single user utterances without conversation context — tests pure end-of-utterance detection.
Source: [TEN-framework/ten-turn-detection](https://github.com/TEN-framework/ten-turn-detection)
**Processing**: Dropped `wait` class (100 examples) because `start_listening` requires conversation context that TEN does not provide.
### `swda` — Switchboard Dialog Act Corpus (3,523 examples)
3-class subset (`start_speaking`, `continue_listening`, `continue_speaking`). Real telephone conversations with up to 5 turns of context.
Source: [cgpotts/swda](https://huggingface.co/datasets/cgpotts/swda)
**Processing applied**:
- Cleaned Switchboard transcription markup (`{F uh}` → `uh`, speech repairs, etc.)
- Dropped ambiguous tags: `aa`/`ny`/`ba` (agreement vs backchannel), `x` (non-verbal), `+` (syntactic continuation, 56% actually complete), `^2` (collaborative completion)
- Kept `%` (abandoned) only when same speaker continues (genuinely incomplete)
- Mapped: backchannel tags (`b`, `bk`, `bh`, `b^m`) → `continue_speaking`; incomplete tags → `continue_listening`; complete tags → `start_speaking`
### `synthetic` — Curated Test Set (60 examples)
Full 4-class subset. Hand-crafted customer service conversations covering all 4 action classes at 4 difficulty levels (easy, medium, hard, ultra_hard).
Source: Created by authors
## Format
Each example contains:
| Field | Description |
|-------|-------------|
| `id` | Unique identifier (traceable to source dataset) |
| `source` | `ten`, `swda`, or `synthetic` |
| `conversation` | Multi-line conversation in `User: .../Agent: ...` format |
| `action` | Ground truth: one of the 4 action classes |
| `original_label` | Original label from the source dataset |
| `num_turns` | Number of conversation turns |
## Benchmark Results
Baseline results using [anyreach-ai/semantic-turn-taking](https://huggingface.co/anyreach-ai/semantic-turn-taking) (Qwen2.5-0.5B fine-tuned for 4-class turn-taking).
### Binary (EOU vs Not-EOU)
Only `start_speaking` and `continue_listening` examples are used. Predictions mapped: `start_speaking`/`continue_speaking` → EOU, `continue_listening`/`start_listening` → Not-EOU.
| Subset | N | Accuracy | F1 (macro) |
|--------|--:|--:|--:|
| TEN | 428 | 91.82% | 91.80% |
| SwDA | 2,688 | 65.96% | 51.46% |
| Synthetic | 36 | 86.11% | 85.57% |
### Multi-class
| Subset | N | Classes | Accuracy | F1 (macro) |
|--------|--:|--------:|--:|--:|
| TEN | 428 | 2 | 91.82% | 91.80% |
| SwDA | 3,523 | 3 | 68.98% | 46.92% |
| Synthetic | 60 | 4 | 76.67% | 72.07% |
## Usage
```python
from datasets import load_dataset
# Load all subsets
ds = load_dataset("anyreach-ai/semantic-turn-taking-benchmark")
# Load a specific subset
ten = load_dataset("anyreach-ai/semantic-turn-taking-benchmark", split="ten")
swda = load_dataset("anyreach-ai/semantic-turn-taking-benchmark", split="swda")
synthetic = load_dataset("anyreach-ai/semantic-turn-taking-benchmark", split="synthetic")
# Iterate
for example in swda:
print(example["conversation"])
print(f"Action: {example['action']}")
```
## Citation
```bibtex
@misc{semantic-turn-taking-2026,
title={Semantic Turn-Taking Model},
author={Shangeth Rajaa},
year={2026},
publisher={Hugging Face},
url={https://huggingface.co/anyreach-ai/semantic-turn-taking}
}
```
## Authors
- [**Shangeth Rajaa**](https://github.com/shangeth)
## License
Apache 2.0
提供机构:
anyreach-ai



