DEBATE: A Dataset for Disentangling Textual Ambiguity in Mandarin Through Speech

Name: DEBATE: A Dataset for Disentangling Textual Ambiguity in Mandarin Through Speech
Creator: Zenodo
Published: 2025-08-08 06:06:04
License: 暂无描述

Zenodo2025-08-08 更新2026-05-26 收录

下载链接：

https://zenodo.org/doi/10.5281/zenodo.15609921

下载链接

链接失效反馈

官方服务：

资源简介：

We present DEBATE, a unique public Chinese speech-text dataset designed to study how speech cues and patterns—pronunciation, pause, stress and intonation—can help resolve textual ambiguity and reveal a speaker’s true intent. It contains 1,001 carefully selected ambiguous utterances, each recorded by 10 native speakers. All audio files were uniformly resampled to 16 kHz and validated for consistency with the corresponding text through manual spot checks and CER evaluation using ASR models. All audio files are packaged in the DEBATE_Audio.zip archive, which contains 10 speaker-specific folders. Within each speaker folder, there are three subfolders—polyphone, segment, and stress—corresponding to the text annotations in Task_Proun.xls, Task_Pause.xls, and Task_Stres.xls, respectively. Speaker-related information is documented in the metadata.xls file.

提供机构：

Zenodo

创建时间：

2025-06-06