anyreach-ai/semantic-turn-taking-benchmark

Name: anyreach-ai/semantic-turn-taking-benchmark
Creator: anyreach-ai
Published: 2026-03-19 15:48:55
License: 暂无描述

Hugging Face2026-03-19 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/anyreach-ai/semantic-turn-taking-benchmark

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: apache-2.0 task_categories: - text-classification tags: - turn-taking - voice-ai - dialog - conversation - end-of-utterance - backchannel - interruption pretty_name: Semantic Turn-Taking Benchmark size_categories: - 1K<n<10K dataset_info: features: - name: id dtype: string - name: source dtype: string - name: messages list: - name: role dtype: string - name: content dtype: string - name: conversation dtype: string - name: action dtype: class_label: names: '0': start_speaking '1': continue_listening '2': start_listening '3': continue_speaking - name: original_label dtype: string - name: num_turns dtype: int32 splits: - name: ten num_examples: 428 - name: swda num_examples: 3523 - name: synthetic num_examples: 60 --- # Semantic Turn-Taking Benchmark A curated evaluation benchmark for **semantic turn-taking models** in voice AI. Given a conversation context, predict what action the AI agent should take: speak, listen, continue speaking, or continue listening. Unlike acoustic-based approaches (VAD, silence detection), this benchmark tests whether a model can make turn-taking decisions from **text/semantic content alone**. ## Action Classes | Action | Description | |--------|-------------| | `start_speaking` | User finished their turn, agent should respond | | `continue_listening` | User is mid-utterance, keep listening | | `start_listening` | User interrupts the agent, agent should stop talking | | `continue_speaking` | User gave a backchannel, agent keeps talking | **`start_speaking` vs `continue_listening`** — Was the user done talking? ``` User: I need help with my bill Agent: Sure I can help with that what seems to be the issue User: I was charged twice for the same order → start_speaking (user done) ``` ``` User: I need help with my bill Agent: Sure I can help with that what seems to be the issue User: I was charged twice for → continue_listening (user not done) ``` **`start_listening` vs `continue_speaking`** — User spoke while agent was talking. Interruption or backchannel? ``` User: What is your refund policy Agent: Our refund policy states that all items purchased within the last thirty days are eligible for a full refund provided that the item is in its original packaging and User: Okay I get → start_listening (real interruption, agent should stop) ``` ``` User: What is your refund policy Agent: Our refund policy states that all items purchased within the last thirty days are eligible for a full refund provided that the item is in its original packaging and User: uh huh → continue_speaking (backchannel, agent keeps talking) ``` ## Dataset Statistics | Subset | start_speaking | continue_listening | start_listening | continue_speaking | Total | |--------|---:|---:|---:|---:|---:| | TEN | 236 | 192 | — | — | 428 | | SwDA | 2,497 | 191 | — | 835 | 3,523 | | Synthetic | 24 | 12 | 12 | 12 | 60 | | **Total** | **2,757** | **395** | **12** | **847** | **4,011** | ## Subsets ### `ten` — TEN Turn-End Detection (428 examples) Binary subset (2 classes: `start_speaking`, `continue_listening`). Single user utterances without conversation context — tests pure end-of-utterance detection. Source: [TEN-framework/ten-turn-detection](https://github.com/TEN-framework/ten-turn-detection) **Processing**: Dropped `wait` class (100 examples) because `start_listening` requires conversation context that TEN does not provide. ### `swda` — Switchboard Dialog Act Corpus (3,523 examples) 3-class subset (`start_speaking`, `continue_listening`, `continue_speaking`). Real telephone conversations with up to 5 turns of context. Source: [cgpotts/swda](https://huggingface.co/datasets/cgpotts/swda) **Processing applied**: - Cleaned Switchboard transcription markup (`{F uh}` → `uh`, speech repairs, etc.) - Dropped ambiguous tags: `aa`/`ny`/`ba` (agreement vs backchannel), `x` (non-verbal), `+` (syntactic continuation, 56% actually complete), `^2` (collaborative completion) - Kept `%` (abandoned) only when same speaker continues (genuinely incomplete) - Mapped: backchannel tags (`b`, `bk`, `bh`, `b^m`) → `continue_speaking`; incomplete tags → `continue_listening`; complete tags → `start_speaking` ### `synthetic` — Curated Test Set (60 examples) Full 4-class subset. Hand-crafted customer service conversations covering all 4 action classes at 4 difficulty levels (easy, medium, hard, ultra_hard). Source: Created by authors ## Format Each example contains: | Field | Description | |-------|-------------| | `id` | Unique identifier (traceable to source dataset) | | `source` | `ten`, `swda`, or `synthetic` | | `conversation` | Multi-line conversation in `User: .../Agent: ...` format | | `action` | Ground truth: one of the 4 action classes | | `original_label` | Original label from the source dataset | | `num_turns` | Number of conversation turns | ## Benchmark Results Baseline results using [anyreach-ai/semantic-turn-taking](https://huggingface.co/anyreach-ai/semantic-turn-taking) (Qwen2.5-0.5B fine-tuned for 4-class turn-taking). ### Binary (EOU vs Not-EOU) Only `start_speaking` and `continue_listening` examples are used. Predictions mapped: `start_speaking`/`continue_speaking` → EOU, `continue_listening`/`start_listening` → Not-EOU. | Subset | N | Accuracy | F1 (macro) | |--------|--:|--:|--:| | TEN | 428 | 91.82% | 91.80% | | SwDA | 2,688 | 65.96% | 51.46% | | Synthetic | 36 | 86.11% | 85.57% | ### Multi-class | Subset | N | Classes | Accuracy | F1 (macro) | |--------|--:|--------:|--:|--:| | TEN | 428 | 2 | 91.82% | 91.80% | | SwDA | 3,523 | 3 | 68.98% | 46.92% | | Synthetic | 60 | 4 | 76.67% | 72.07% | ## Usage ```python from datasets import load_dataset # Load all subsets ds = load_dataset("anyreach-ai/semantic-turn-taking-benchmark") # Load a specific subset ten = load_dataset("anyreach-ai/semantic-turn-taking-benchmark", split="ten") swda = load_dataset("anyreach-ai/semantic-turn-taking-benchmark", split="swda") synthetic = load_dataset("anyreach-ai/semantic-turn-taking-benchmark", split="synthetic") # Iterate for example in swda: print(example["conversation"]) print(f"Action: {example['action']}") ``` ## Citation ```bibtex @misc{semantic-turn-taking-2026, title={Semantic Turn-Taking Model}, author={Shangeth Rajaa}, year={2026}, publisher={Hugging Face}, url={https://huggingface.co/anyreach-ai/semantic-turn-taking} } ``` ## Authors - [**Shangeth Rajaa**](https://github.com/shangeth) ## License Apache 2.0

提供机构：

anyreach-ai

5,000+

优质数据集

54 个

任务类型

进入经典数据集