DEBATE: A Dataset for Disentangling Textual Ambiguity in Mandarin Through Speech
收藏Zenodo2025-08-08 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.15609921
下载链接
链接失效反馈官方服务:
资源简介:
We present DEBATE, a unique public Chinese speech-text dataset designed to study how speech cues and patterns—pronunciation, pause, stress and intonation—can help resolve textual ambiguity and reveal a speaker’s true intent.
It contains 1,001 carefully selected ambiguous utterances, each recorded by 10 native speakers. All audio files were uniformly resampled to 16 kHz and validated for consistency with the corresponding text through manual spot checks and CER evaluation using ASR models. All audio files are packaged in the DEBATE_Audio.zip archive, which contains 10 speaker-specific folders. Within each speaker folder, there are three subfolders—polyphone, segment, and stress—corresponding to the text annotations in Task_Proun.xls, Task_Pause.xls, and Task_Stres.xls, respectively. Speaker-related information is documented in the metadata.xls file.
提供机构:
Zenodo
创建时间:
2025-06-06



