AUSpeech: An Audio-Ultrasound Synchronized Database of Tongue Movement for Mandarin speech
收藏DataCite Commons2025-04-27 更新2025-04-16 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=7b6a498ee725439ba3a060985fece3cd
下载链接
链接失效反馈官方服务:
资源简介:
Ultrasound has been used to render animations of articulation during speech production, specifically tongue movement, for visual feedback used in intervention for articulation disorders and speech recognition. Nevertheless, the availability of high-quality audio-ultrasound datasets remains scarce. The present study, therefore, aims to construct a multimodal database designed for Mandarin speech. The dataset integrates synchronized ultrasound images of lingual movement, and the corresponding audio recordings and text annotations elicited from 43 healthy speakers and 11 patients with dysarthria through speech tasks (including monophthong vowels, monosyllables, and sentences), with a total duration of 22.31 hours. During production, a high-resolution (920×700 at 60 fps) ultrasound device and a high-fidelity microphone were used to simultaneously record tongue motion and audio signals, maintaining sync via experimental setup. In addition, a customized helmet structure was employed to stabilize the ultrasound probe, precisely controlling for head movement, minimizing displacement interference, and ensuring spatial stability of the images. The proposed database carries apparent values in automatic speech recognition, silent interface development, and research in speech pathology and linguistics.
超声技术已被用于生成言语产生过程中发音动作(尤其是舌部运动)的动画,为发音障碍干预与语音识别任务提供视觉反馈。然而,高质量音-超声数据集仍较为匮乏。因此,本研究旨在构建一款面向普通话语音的多模态数据库。该数据集整合了同步采集的舌部运动超声图像、对应音频录音与文本标注信息,数据来自43名健康发音者与11名构音障碍患者,通过语音任务(包括单元音、单音节词与语句)采集得到,总时长达22.31小时。数据采集过程中,研究采用了分辨率为920×700、帧率为60 fps的高分辨率超声设备与高保真麦克风,同步记录舌部运动与音频信号,并通过实验设置确保二者时序同步。此外,研究采用定制化头盔结构固定超声探头,精准控制头部运动,最大程度降低位移干扰,保障图像的空间稳定性。本数据库在自动语音识别、静默交互界面开发以及言语病理学与语言学研究领域均具有显著应用价值。
提供机构:
Science Data Bank
创建时间:
2024-12-20
搜集汇总
数据集介绍

背景与挑战
背景概述
AUSpeech是一个多模态普通话语音数据库,集成了同步的舌头运动超声图像、音频录音和文本注释,适用于自动语音识别、无声界面开发及语音病理学和语言学研究。数据采集过程严格控制,确保高质量和同步性。
以上内容由遇见数据集搜集并总结生成



