five

Supplementary Material for: Plug-and-play microphones for recording speech and voice with smart devices

收藏
DataCite Commons2023-11-15 更新2024-08-18 收录
下载链接:
https://karger.figshare.com/articles/dataset/Supplementary_Material_for_Plug-and-play_microphones_for_recording_speech_and_voice_with_smart_devices/24566818
下载链接
链接失效反馈
官方服务:
资源简介:
INTRODUCTION Smart devices are widely available and capable of quickly recording and uploading speech segments for health-related analysis. The switch from laboratory recordings with professional-grade microphone set ups to remote, smart device-based recordings offers immense potential for the scalability of voice assessment. Yet, a growing body of literature points to a wide heterogeneity among acoustic metrics for their robustness to variation in recording devices. The addition of consumer-grade plug-and-play microphones has been proposed as a possible solution. Our aim was to assess if the addition of consumer-grade plug-and-play microphones increase the acoustic measurement agreement between ultra-portable devices and a reference microphone. METHODS Speech was simultaneously recorded by a reference high-quality microphone commonly used in research, and by two configurations with plug-and-play microphones. Twelve speech-acoustic features were calculated using recordings from each microphone to determine the agreement intervals in measurements between microphones. Agreement intervals were then compared to expected deviations in speech in various neurological conditions. Each microphone’s response to speech and to silence were characterized through acoustic analysis to explore possible reasons for differences in acoustic measurements between microphones. The statistical differentiation of two groups, neurotypical and people with Multiple Sclerosis, using metrics from each tested microphone was compared to that of the reference microphone. RESULTS The two consumer-grade plug-and-play microphones favoured high frequencies (mean centre of gravity difference ≥ +175.3Hz) and recorded more noise (mean difference in signal-to-noise ≤ -4.2dB) when compared to the reference microphone. Between consumer-grade microphones, differences in relative noise were closely related to distance between the microphone and the speaker’s mouth. Agreement intervals between the reference and consumer-grade microphones remained under disease-expected deviations only for fundamental frequency (f0, agreement interval ≤0.06Hz), f0 instability (f0 CoV, agreement interval ≤0.05%) and for tracking of second formant movement (agreement interval ≤1.4Hz/millisecond). Agreement between microphones was poor for other metrics, particularly for fine timing metrics (mean pause length and pause length variability for various tasks). The statistical difference between the two groups of speakers was smaller with the plug-and-play than with the reference microphone. CONCLUSION Measurement of f0 and F2 slope were robust to variation in recording equipment while other acoustic metrics were not. Thus, the tested plug-and-play microphones should not be used interchangeably with professional-grade microphones for speech analysis. Plug-and-play microphones may assist in equipment standardization within speech studies, including remote or self-recording, possibly with small loss in accuracy and statistical power as observed in this study.

引言 智能设备现已广泛普及,可快速录制并上传语音片段以开展健康相关分析。将语音采集从实验室使用专业级麦克风(professional-grade microphone)设备的录制方式,切换为基于远程智能设备的录制模式,为语音评估的规模化应用提供了巨大潜力。然而,越来越多的研究表明,不同声学指标对录音设备差异的鲁棒性存在显著异质性。有研究提出,加装消费级即插即用麦克风(consumer-grade plug-and-play microphone)或许是解决该问题的可行方案。本研究旨在评估加装消费级即插即用麦克风后,超便携设备与参考麦克风(reference microphone)之间的声学测量一致性是否得到提升。 研究方法 本研究使用科研领域常用的参考级高品质麦克风,以及两种搭载消费级即插即用麦克风的设备,同步采集语音信号。基于各麦克风的录制结果,共计算12项语音声学特征,以确定不同麦克风间测量结果的一致性区间。随后将该一致性区间与各类神经系统疾病患者的语音预期偏差进行对比。通过声学分析表征各麦克风对语音与静音状态的响应特性,以探究不同麦克风声学测量结果存在差异的潜在原因。基于各测试麦克风提取的特征,分别对神经典型人群(neurotypical)与多发性硬化(Multiple Sclerosis, MS)患者两组人群进行统计区分,并将结果与参考麦克风的区分效果进行对比。 研究结果 与参考麦克风相比,两款消费级即插即用麦克风均更倾向于采集高频信号(平均重心频率差≥+175.3Hz),且录制的背景噪声更高(信噪比平均差值≤-4.2dB)。在两款消费级麦克风之间,相对噪声的差异与麦克风与说话者嘴部的距离密切相关。仅在基频(fundamental frequency, f0)、基频不稳定性(f0变异系数,f0 Coefficient of Variation, f0 CoV)以及第二共振峰(second formant, F2)运动追踪这三项指标上,参考麦克风与消费级麦克风间的一致性区间仍处于疾病预期偏差范围内:基频一致性区间≤0.06Hz,f0 CoV一致性区间≤0.05%,第二共振峰运动追踪一致性区间≤1.4Hz/毫秒。其余指标的麦克风间一致性均较差,尤其是精细时序类指标(如各类任务下的平均停顿时长与停顿时长变异性)。基于即插即用麦克风提取的特征,对两组受试者的统计区分度低于参考麦克风。 结论 基频与第二共振峰斜率的测量对录音设备差异具有鲁棒性,其余声学指标则不具备该特性。因此,本研究测试的消费级即插即用麦克风不应与专业级麦克风互换用于语音分析。即插即用麦克风或可助力语音研究中设备的标准化工作,包括远程录制或自主录制场景,但正如本研究观察到的,其精度与统计效力可能会出现小幅下降。
提供机构:
Karger Publishers
创建时间:
2023-11-15
二维码
社区交流群
二维码
科研交流群
商业服务