Dataset for: Sparse periodicity-based auditory features explain human performance in a spatial multi-talker auditory scene analysis task
收藏DataCite Commons2020-08-29 更新2024-07-27 收录
下载链接:
https://wiley.figshare.com/articles/Dataset_for_Sparse_periodicity-based_auditory_features_explain_human_performance_in_a_spatial_multi-talker_auditory_scene_analysis_task/6387890/1
下载链接
链接失效反馈官方服务:
资源简介:
Human listeners robustly decode speech information from a talker of interest that is embedded in a mixture of spatially distributed interferers. A relevant question is which time-frequency segments of the speech are predominantly used by a listener to solve such a complex Auditory Scene Analysis task. A recent psychoacoustic study investigated the relevance of low signal-to-noise ratio (SNR) components of a target signal on speech intelligibility in a spatial multi-talker situation. For this, a three-talker stimulus was manipulated in the spectro-temporal domain such that target speech time-frequency units below a variable SNR threshold (SNR<sub>crit</sub>) were discarded while keeping the interferers unchanged. The psychoacoustic data indicate that only target components at and above a local SNR of about 0 dB contribute to intelligibility. The present study applies an auditory scene analysis “glimpsing” model to the same manipulated stimuli. Model data are found to be similar to the human data, supporting the notion of “glimpsing”, i.e., that salient speech-related information is predominantly used by the auditory system to decode speech embedded in a mixture of sounds, at least for the tested conditions of three overlapping speech signals. This implies that perceptually relevant auditory information is sparse and may be processed with low computational effort, which is relevant for neurophysiological research of scene analysis and novelty processing in the auditory system.
人类听众能够稳健地从空间分布的干扰声混合信号中,解码出目标讲话者的语音信息。当前一个亟待解答的核心问题是:在完成此类复杂的听觉场景分析(Auditory Scene Analysis)任务时,人类听众主要会利用语音的哪些时频片段?近期一项心理声学研究针对空间多讲话者场景下的语音可懂度,探讨了目标信号的低信噪比(Signal-to-Noise Ratio, SNR)成分所发挥的作用。为此,该研究在时频域对三讲话者刺激材料进行操控:保留干扰声不变,剔除低于可变临界信噪比(SNR<sub>crit</sub>)的目标语音时频单元。该心理声学实验数据表明,仅当目标语音成分的局部信噪比达到约0 dB及以上时,才会对语音可懂度产生贡献。本研究将听觉场景分析的"瞥见(glimpsing)"模型应用于上述经操控的刺激材料。模型结果与人类听众的实验数据高度吻合,佐证了"瞥见"这一理论:即听觉系统主要依赖与语音相关的显著信息,来解码混合声中的语音信号——至少在本次测试的三重叠语音信号场景下是如此。这一发现暗示,感知层面相关的听觉信息较为稀疏,且可通过较低的计算负荷完成处理,这一结论对听觉系统中场景分析与新颖性处理的神经生理学研究具有参考价值。
提供机构:
Wiley
创建时间:
2018-08-22



