mit-impulse-response-survey-16khz
收藏魔搭社区2025-11-27 更新2025-03-22 收录
下载链接:
https://modelscope.cn/datasets/benjamin-paine/mit-impulse-response-survey-16khz
下载链接
链接失效反馈官方服务:
资源简介:
# Author's Description
> These are environmental Impulse Responses (IRs) measured in the real-world IR survey as described in [Traer and McDermott, PNAS, 2016](https://www.pnas.org/doi/full/10.1073/pnas.1612524113).
> The survey locations were selected by tracking the motions of 7 volunteers over the course of 2 weeks of daily life. We sent the volunteers 24 text messages every day at randomized times and asked the volunteers to respond with their location at the time the text was sent. We then retraced their steps and measured the acoustic impulse responses of as many spaces as possible. We recorded 271 IRs from a total of 301 unique locations. This data set therefore reflects the diversity of acoustic distortion our volunteers encounter in the course of daily life. All recordings were made with a 1.5 meter spacing between speaker and microphone to simulate a typical conversation.
>
> [James Traer and Josh H. McDermott, mcdermottlab.mit.edu](https://mcdermottlab.mit.edu/Reverb/IR_Survey.html)
# Repacking Notes
The following changes were made to repack for 🤗 Datasets / 🥐 Croissant:
- Resampled audio from `32khz` to `16khz`. For the `32khz` version, see [benjamin-paine/mit-impulse-response-survey](https://huggingface.co/datasets/benjamin-paine/mit-impulse-response-survey).
- Mapped beggining part of filename to *id*.
- Mapped second part of filename to *location*, and turned into a class label (enumeration.)
- When present, mapped third (but not final) part of filename to *detail*.
- Mapped final part of filename to *hits*.
- Adjusted several filenames by correcting typos, homogenizing capitalization, and occasionally switching the order of *location* and *detail*.
# License
These files are licensed under an MIT Creative Commons license, [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/). Please cite the Traer and McDermott paper when used, as exampled below.
# Citation
```
@article{
doi:10.1073/pnas.1612524113,
author = {James Traer and Josh H. McDermott},
title = {Statistics of natural reverberation enable perceptual separation of sound and space},
journal = {Proceedings of the National Academy of Sciences},
volume = {113},
number = {48},
pages = {E7856-E7865},
year = {2016},
doi = {10.1073/pnas.1612524113},
URL = {https://www.pnas.org/doi/abs/10.1073/pnas.1612524113},
eprint = {https://www.pnas.org/doi/pdf/10.1073/pnas.1612524113},
abstract = {Sounds produced in the world reflect off surrounding surfaces on their way to our ears. Known as reverberation, these reflections distort sound but provide information about the world around us. We asked whether reverberation exhibits statistical regularities that listeners use to separate its effects from those of a sound’s source. We conducted a large-scale statistical analysis of real-world acoustics, revealing strong regularities of reverberation in natural scenes. We found that human listeners can estimate the contributions of the source and the environment from reverberant sound, but that they depend critically on whether environmental acoustics conform to the observed statistical regularities. The results suggest a separation process constrained by knowledge of environmental acoustics that is internalized over development or evolution. In everyday listening, sound reaches our ears directly from a source as well as indirectly via reflections known as reverberation. Reverberation profoundly distorts the sound from a source, yet humans can both identify sound sources and distinguish environments from the resulting sound, via mechanisms that remain unclear. The core computational challenge is that the acoustic signatures of the source and environment are combined in a single signal received by the ear. Here we ask whether our recognition of sound sources and spaces reflects an ability to separate their effects and whether any such separation is enabled by statistical regularities of real-world reverberation. To first determine whether such statistical regularities exist, we measured impulse responses (IRs) of 271 spaces sampled from the distribution encountered by humans during daily life. The sampled spaces were diverse, but their IRs were tightly constrained, exhibiting exponential decay at frequency-dependent rates: Mid frequencies reverberated longest whereas higher and lower frequencies decayed more rapidly, presumably due to absorptive properties of materials and air. To test whether humans leverage these regularities, we manipulated IR decay characteristics in simulated reverberant audio. Listeners could discriminate sound sources and environments from these signals, but their abilities degraded when reverberation characteristics deviated from those of real-world environments. Subjectively, atypical IRs were mistaken for sound sources. The results suggest the brain separates sound into contributions from the source and the environment, constrained by a prior on natural reverberation. This separation process may contribute to robust recognition while providing information about spaces around us.}}
```
# 作者原始描述
> 本数据集包含基于真实场景声学脉冲响应(Impulse Response, IR)调研所测得的环境脉冲响应,相关调研细节见[Traer与McDermott,《美国国家科学院院刊》,2016](https://www.pnas.org/doi/full/10.1073/pnas.1612524113)。
>
> 本次调研的采样点位通过追踪7名志愿者为期两周的日常活动轨迹确定。我们每日于随机时刻向志愿者发送24条短信,要求其回复短信发送时所处的位置。随后我们按志愿者的行动轨迹重访各点位,尽可能多地测量各空间的声学脉冲响应。最终我们从总计301个独特点位中录制得到271条脉冲响应数据。因此本数据集完整反映了志愿者在日常生活中所遭遇的各类声学失真场景。所有录音均以扬声器与麦克风间距1.5米的配置进行,以模拟典型的面对面交谈场景。
>
> [James Traer与Josh H. McDermott,mcdermottlab.mit.edu](https://mcdermottlab.mit.edu/Reverb/IR_Survey.html)
# 重打包说明
为适配🤗 数据集库(Datasets)与🥐 Croissant数据集格式,我们对原始数据进行了如下调整:
- 将音频采样率从32kHz重采样至16kHz。如需获取32kHz版本的数据集,请访问[benjamin-paine/mit-impulse-response-survey](https://huggingface.co/datasets/benjamin-paine/mit-impulse-response-survey)。
- 将文件名的起始部分映射为字段`id`。
- 将文件名的第二部分映射为字段`location`,并将其转换为枚举形式的类别标签。
- 若存在,则将文件名的第三部分(非最终部分)映射为字段`detail`。
- 将文件名的最终部分映射为字段`hits`。
- 对部分文件名进行了修正:校正拼写错误、统一大小写格式,偶有调整`location`与`detail`的字段顺序。
# 授权协议
本数据集文件采用知识共享署名4.0(CC-BY 4.0)许可协议进行授权。如需使用该数据集,请引用Traer与McDermott的相关论文,引用格式示例如下。
# 引用格式
@article{
doi:10.1073/pnas.1612524113,
author = {James Traer and Josh H. McDermott},
title = {Statistics of natural reverberation enable perceptual separation of sound and space},
journal = {Proceedings of the National Academy of Sciences},
volume = {113},
number = {48},
pages = {E7856-E7865},
year = {2016},
doi = {10.1073/pnas.1612524113},
URL = {https://www.pnas.org/doi/abs/10.1073/pnas.1612524113},
eprint = {https://www.pnas.org/doi/pdf/10.1073/pnas.1612524113},
abstract = {世界中产生的声音在传播至人耳的过程中会围绕周围表面发生反射,这类反射被称为混响,其会使声音发生失真,但同时也能提供周围环境的相关信息。我们旨在探究混响是否存在统计规律,而听众可利用这些规律将其对声音的影响与声源本身的影响区分开来。我们对真实世界的声学特性开展了大规模统计分析,揭示了自然场景中混响的显著统计规律。研究发现,人类听众可从混响音频中估算出声源与环境各自的声学贡献,但这一能力严格依赖于环境声学特性是否符合观测到的统计规律。研究结果表明,这种分离过程受到环境声学知识的约束,而这类知识是在发育或进化过程中内化形成的。在日常聆听中,声音既会直接从声源传至人耳,也会通过被称为混响的反射间接到达。混响会极大地改变声源发出的声音,但人类仍能够识别声源并区分不同的环境,其背后的机制目前仍不明确。核心的计算挑战在于,声源与环境的声学特征会在耳朵接收到的单一信号中被结合在一起。本文旨在探究我们对声源与空间的识别是否反映了一种将二者影响分离的能力,以及这种分离是否由真实世界混响的统计规律所支撑。为首先确认此类统计规律是否存在,我们测量了从人类日常生活中采样得到的271个空间的脉冲响应。这些采样空间具有多样性,但其脉冲响应具有严格的约束条件,表现出随频率变化的指数衰减特性:中频的混响持续时间最长,而高频与低频的衰减速度更快,这可能是由材料与空气的吸收特性所导致的。为测试人类是否利用了这些规律,我们在模拟混响音频中操纵了脉冲响应的衰减特性。听众能够从这些信号中区分声源与环境,但当混响特性偏离真实世界的环境时,他们的辨别能力会出现下降。主观上,非典型的脉冲响应会被误识别为声源。研究结果表明,大脑会基于自然混响的先验知识,将声音分离为声源与环境的贡献。这种分离过程可能有助于实现鲁棒的识别能力,同时还能提供周围空间的相关信息。}}
提供机构:
maas
创建时间:
2025-03-18



