five

Treble10-RIR

收藏
魔搭社区2026-01-06 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/treble-technologies/Treble10-RIR
下载链接
链接失效反馈
官方服务:
资源简介:
## Dataset Description - **Paper:** https://arxiv.org/abs/2510.23141 - **Point of contact:** contact@treble.tech # **Treble10-RIR (32 kHz)** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1pSJHitYAOGIv2uzr7kGNDxqVflWvTQ2c?usp=sharing) The **Treble10-RIR** dataset is a dataset for automatic speech recognition (ASR), containing high fidelity room-acoustic simulations from 10 different furnished rooms: 2 bathrooms, 2 bedrooms, 2 living rooms with hallway, 2 living rooms without hallway, 2 meeting rooms. The room volumes range between 14 and 46 m3, resulting in reverberation times between 0.17 and 0.84 s. Illustrative plots of the rooms and device included in this dataset may be found in the repository outside of the dataset. This datacard provides examples of how to work with the data, explains how the data was generated, and describes the extensive metadata included in the dataset. ## Example: Convolve speech with a Treble10 RIR ```python from datasets import load_dataset from scipy.signal import fftconvolve, resample_poly, spectrogram from scipy.io.wavfile import write import matplotlib.pyplot as plt import numpy as np sr = 16000 # 1. Load one LibriSpeech sample speech_ds = load_dataset("openslr/librispeech_asr", "clean", split="test[:1]") speech = speech_ds[0]["audio"]["array"] # 2. Load one Treble10 RIR rir_ds = load_dataset("treble-technologies/Treble10-RIR", split="rir_mono", streaming=True) rir_rec = next(iter(rir_ds)) rir = rir_rec["audio"]["array"] rir_sr = rir_rec["audio"]["sampling_rate"] # 3. Downsample RIR if rir_sr != sr: rir = resample_poly(rir, sr, rir_sr) # 4. Convolve and normalize rev = fftconvolve(speech, rir, mode="full") rev /= np.max(np.abs(rev)) + 1e-12 # 5. Plot spectrogram f, t, Sxx = spectrogram(rev, fs=sr, nperseg=512, noverlap=256) plt.pcolormesh(t, f, 10*np.log10(Sxx+1e-12), shading="auto") plt.xlabel("Time [s]") plt.ylabel("Frequency [Hz]") plt.title("Spectrogram of Reverberated Speech") plt.tight_layout() plt.show() # 6. Save write("audio_reverb.wav", sr, (rev * 32767).astype(np.int16)) print("✅ Saved: audio_reverb.wav") ``` ## Example: Read a batch of mono RIRs from Treble10 into a PyTorch dataloader ```python # Load a batch of Treble10 RIRs with PyTorch import torch from datasets import load_dataset, Audio from torch.utils.data import DataLoader # Load the dataset in streaming mode rir_ds = load_dataset("treble-technologies/Treble10-RIR", split="rir_mono", streaming=True) rir_ds = rir_ds.cast_column("audio", Audio()) def collate_fn(batch): """Convert the RIRs to torch tensors and pad them to the same length.""" arrays = [torch.tensor(ex["audio"]["array"]) for ex in batch] # Pad to the longest RIR in the batch max_len = max(rir.shape[0] for rir in arrays) padded = torch.stack([torch.nn.functional.pad(rir, (0, max_len - rir.shape[0])) for rir in arrays]) sampling_rate = batch[0]["audio"]["sampling_rate"] return {"rirs": padded, "sampling_rate": sampling_rate} # Set up a torch dataloader rir_loader = DataLoader(rir_ds, batch_size=4, collate_fn=collate_fn) # Fetch one batch batch = next(iter(rir_loader)) rirs = batch["rirs"] # Tensor (batch size, number time samples) sr = batch["sampling_rate"] print(f"Batch shape: {rirs.shape}, Sample rate: {sr}") ``` ## Example: Read a 6 channel device RIR from Treble10 and compare two of the microphone signals ```python from datasets import load_dataset, Audio import matplotlib.pyplot as plt import numpy as np ds = load_dataset( "treble-technologies/Treble10-RIR", split="rir_6ch", streaming=True, ) ds = ds.cast_column("audio", Audio()) # Read the samples from the TorchCodec decoder object: rec = next(iter(ds)) samples = rec["audio"].get_all_samples() rir_6ch = samples.data sr = samples.sample_rate print(f"6 channel RIR has this shape: {rir_6ch.shape}, and a sampling rate of {sr} Hz.") # We can access and compare individual channels from the 6ch device like this rir0 = rir_6ch[0] # mic 0 rir1 = rir_6ch[4] # mic 4 t_axis = np.arange(rir0.shape[0]) / sr plt.figure() plt.plot(t_axis, rir0.numpy(), label="Microphone 0") plt.plot(t_axis, rir1.numpy(), label="Microphone 4") plt.xlabel("Time (s)") plt.ylabel("Amplitude") plt.legend() plt.show() ``` ## Example: Read a HOA8 RIR from Treble10 ```python from datasets import load_dataset, Audio import io, soundfile as sf # Load dataset in streaming mode ds = load_dataset("treble-technologies/Treble10-RIR", split="rir_hoa8", streaming=True) # Disable automatic decoding (we'll do it manually) ds = ds.cast_column("audio", Audio(decode=False)) # Get one sample from the iterator sample = next(iter(ds)) # Fetch raw audio bytes audio_bytes = sample["audio"]["bytes"] # Some older datasets may not have "bytes", so fall back to reading from the file if audio_bytes is None: # Use huggingface's file object directly with sample["audio"]["path"].open("rb") as f: audio_bytes = f.read() # Decode the HOA audio directly from memory rir_hoa, sr = sf.read(io.BytesIO(audio_bytes)) print(f"Loaded HOA RIR: shape={rir_hoa.shape}, sr={sr}") ``` ## Dataset Details The dataset contains three subsets: - **Treble10-RIR-mono**: This subset contains mono room impulse responses (RIRs). In each room, RIRs are available between 5 sound sources and several receivers. The receivers are placed along horizontal receiver grids with 0.5 m resolution at three heights (0.5 m, 1.0 m, 1.5 m). The validity of all source and receiver positions is checked to ensure that none of them intersects with the room geometry or furniture. - **Treble10-RIR-hoa8**: This subset contains 8th-order Ambisonics RIRs. The sound sources and receivers are identical to the RIR-mono subset. - **Treble10-RIR-6ch**: For this subset, a 6-channel cylindrical device is placed at the receiver positions from the RIR-mono subset. RIRs are then acquired between the 5 sound sources from above and each of the 6 device microphones. In other words, there is a 6-channel DeviceRIR for each source-receiver combination of the RIR-mono subset. The microphone coordinates are part of the metadata for the 6ch split. All RIRs (mono/HOA/device) were simulated with the Treble SDK, and more details on the tool can be found in the dedicated section below. We use a hybrid simulation paradigm that combines a numerical wave-based solver (discontinuous Galerkin finite element method, DG-FEM) at low to midrange frequencies with geometrical acoustics (GA) simulations at high frequencies. For the **Treble10-RIR** dataset, the transition frequency between the wave-based and the GA simulation is set at 5 kHz. The resulting hybrid RIRs are broadband signals with a 32 kHz sampling rate, thus covering the entire frequency range of the signal and containing audio content up to 16 kHz. A small subset of simulations from the same rooms has previously been released as part of the Generative Data Augmentation (GenDA) challenge at ICASSP 2025. The **Treble10-RIR** dataset differs from the GenDA dataset in three fundamental aspects: 1. The **Treble10-RIR** dataset contains broadband RIRs from a hybrid simulation paradigm (wave-based below 5 kHz, GA above 5 kHz), covering the entire frequency range of a 32 kHz signal. In contrast to the GenDA subset, which only contained the wave-based portion, the **Treble10-RIR** dataset therefore more than doubles the usable frequency range. 2. The **Treble10-RIR** dataset consists of 6 subsets in total. While three of those subsets contain RIRs (mono, 8th-order Ambisonics, 6-channel device), the other three contain pre-convolved scenes in identical channel formats. The GenDA subset was limited to mono and 8th-order Ambisonics RIRs, and no pre-convolved scenes were provided. 3. With **Treble10-RIR**, we publish the entire dataset, containing approximately 3100 source-receiver configurations. The GenDA subset only contained a small fraction of approximately 60 randomly selected source-receiver configurations. ## Uses Use cases such as far-field automatic speech recognition (ASR), speech enhancement, dereverberation, and source separation benefit greatly from the **Treble10-RIR** dataset. To illustrate this, consider the contrast between near-field and far-field ASR. In near-field setups, such as smartphones or headsets, the microphone is close to the speaker, capturing a clean signal dominated by the direct sound. In far-field scenarios, as in smart speakers or conference-room devices, the microphone is several meters away, and the recorded signal becomes a complex blend of direct sound, reverberation, and background noise. This difference is not merely spatial but physical: in far-field conditions, sound waves reflect off walls, diffract around objects, and decay over time, all of which are captured by the room impulse response (RIR). To achieve robust performance in such environments, ASR and related models must be trained on datasets that accurately represent these intricate acoustic interactions—precisely what **Treble10-RIR** provides. Similarly, the performance of such systems can only be reliably determined when evaluating them on data that is accurate enough to model sound propagation in complex environments. ## Dataset Structure Each subset of **Treble10-RIR** corresponds to a different channel configuration of the simulated room impulse responses (RIRs). All subsets share the same metadata schema and organization. |Split | Description | Channels | |--------------|---------------------|----------| |`rir_mono` | Single-channel mono RIRs | 1 | |`rir_hoa8` | 8th-order Ambisonics RIRs (ACN/SN3D format) | 81 | |`rir_6ch` | Six-channel home audio device layout | 6 | The six-channel device has microphones positioned at the following locatiosn relative to the center of the device: |Channel|Position [m]| |-------|--------| |0 |[0.03, 0., 0.]| |1 |[0.015.. 0.026., 0.]| |2 |[-0.0145, 0.026, 0.]| |3 |[-0.03, 0., 0.]| |4 |[-0.015, -0.026, 0.]| |5 |[0.015, -0.026, 0.]| ### File Contents Each `.parquet` file contains the metadata for one subset (split) of the dataset. As this set of RIRs may be used for a variety of potential audio machine-learning tasks, we leave the actual segmentation of the data to the users. The metadata links each impulse response to its corresponding audio file and includes detailed acoustic parameters. | Column | Description | |---------|-------------| | **audio** | Reference to the RIR audio file. | | **Filename** | Filename and relative path of the WAV file. | | **Room** | Short room nickname (e.g., `Room1`, `Room5`). | | **Room Description** | Descriptive room type (e.g., `meeting_room`, `living_room`). | | **Room Volume [m³]** | Volume of the room in cubic meters. | | **Direct Path Length [m]** | Distance between source and receiver. | | **Source Label / Position** | Label and 3D coordinates of the source. | | **Receiver Label / Position** | Label and 3D coordinates of the receiver. | | **Receiver Type** | Receiver configuration (`mono`, `8th order`, or `6-channel`). | | **Frequencies, EDT, T30, C50, Average Absorption** | Octave-band acoustic parameters. | | **Avg EDT, Avg T30, Avg Absorption** | Broadband summary values. | ## Acoustic Parameters The RIRs are presented with a few relevant acoustical parameters describing the acoustical field as sampled with the specific source/receiver pairs. ### T30: Reverberation Time T30 is a measure of how long a sound takes to fade away in a room after the sound source stops emitting noise. It is a key measure of how reverberant a space is. Specifically, it's the time needed for the sound energy to drop by 60 decibels, estimated from the first 30 dBs of the decay.' A short T30 correlates to a "dry" sounding room, like a small office or recording booth (ideally, under 0.2s). A long T30 correlates to a room that sounds "wet", such as a concert hall or parking garage (1.0s or more). ### EDT: Early Decay Time Early Decay Time is another measure of reverberation, but is calculated from the first 10 dB of energy decay. EDT is highly correlated with the psychoacoustic perception of reverberation, and can also provide information about the uniformity of the acoustic field within a space. If EDT is approximately equal to T30, the reverberation is approximately a single-slope decay. IF EDT is much shorter than T30, this indicates the existence of a double-slope energy decay, which may form when two rooms are acoustically coupled. ### C50: Clarity Index (Speech) C50 is an energy ratio between the early arriving sound (the first 50 milliseconds) to the late arrinng sound (from 50 milliseconds to the end of the RIR). C50 is typically used as a measure of the potential speech intelligibility and clarity of a room, as it quantifies how much the early sound is obscured by the room's reverberation. ' High C50 values (above 0dB) are typically considered to be ideal for clear and intelligible speech. Low C50 values (below 0dB) are typically considered to be difficult for speech clarity. ## More Information More information on the dataset can be found on the corresponding blog post. ## Licensing Information The **Treble10-RIR** dataset is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license](https://creativecommons.org/licenses/by-nc-sa/4.0/). ### Citation Information ``` @misc{mullins2025treble10highqualitydatasetfarfield, title={Treble10: A high-quality dataset for far-field speech recognition, dereverberation, and enhancement}, author={Sarabeth S. Mullins and Georg G\"otz and Eric Bezzam and Steven Zheng and Daniel Gert Nielsen}, year={2025}, eprint={2510.23141}, archivePrefix={arXiv}, primaryClass={eess.AS}, url={https://arxiv.org/abs/2510.23141}, } ```

## 数据集说明 - **论文**:https://arxiv.org/abs/2510.23141 - **联系人**:contact@treble.tech # **Treble10-RIR(32 kHz)** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1pSJHitYAOGIv2uzr7kGNDxqVflWvTQ2c?usp=sharing) **Treble10-RIR** 数据集是面向自动语音识别(Automatic Speech Recognition, ASR)的专用数据集,包含来自10间不同带家具房间的高保真房间声学仿真数据:2间浴室、2间卧室、2间带走廊的客厅、2间无走廊的客厅以及2间会议室。 房间体积介于14至46立方米,对应的混响时间范围为0.17至0.84秒。 本数据集配套的房间与设备示例示意图可在数据集外的代码仓库中查阅。 本数据卡片提供了该数据集的使用示例,阐释了数据的生成方式,并详述了数据集中包含的丰富元数据。 ## 示例:将语音与Treble10 RIR进行卷积 python from datasets import load_dataset from scipy.signal import fftconvolve, resample_poly, spectrogram from scipy.io.wavfile import write import matplotlib.pyplot as plt import numpy as np sr = 16000 # 1. 加载一条LibriSpeech语音样本 speech_ds = load_dataset("openslr/librispeech_asr", "clean", split="test[:1]") speech = speech_ds[0]["audio"]["array"] # 2. 加载一条Treble10 RIR rir_ds = load_dataset("treble-technologies/Treble10-RIR", split="rir_mono", streaming=True) rir_rec = next(iter(rir_ds)) rir = rir_rec["audio"]["array"] rir_sr = rir_rec["audio"]["sampling_rate"] # 3. 对RIR进行降采样 if rir_sr != sr: rir = resample_poly(rir, sr, rir_sr) # 4. 卷积并归一化 rev = fftconvolve(speech, rir, mode="full") rev /= np.max(np.abs(rev)) + 1e-12 # 5. 绘制语谱图 f, t, Sxx = spectrogram(rev, fs=sr, nperseg=512, noverlap=256) plt.pcolormesh(t, f, 10*np.log10(Sxx+1e-12), shading="auto") plt.xlabel("时间 [秒]") plt.ylabel("频率 [赫兹]") plt.title("混响语音语谱图") plt.tight_layout() plt.show() # 6. 保存音频 write("audio_reverb.wav", sr, (rev * 32767).astype(np.int16)) print("✅ 已保存:audio_reverb.wav") ## 示例:从Treble10批量读取单声道RIR并构建PyTorch数据加载器 python # 使用PyTorch加载一批Treble10 RIR import torch from datasets import load_dataset, Audio from torch.utils.data import DataLoader # 以流式加载模式加载数据集 rir_ds = load_dataset("treble-technologies/Treble10-RIR", split="rir_mono", streaming=True) rir_ds = rir_ds.cast_column("audio", Audio()) def collate_fn(batch): """将RIR转换为PyTorch张量并将其填充至相同长度。""" arrays = [torch.tensor(ex["audio"]["array"]) for ex in batch] # 填充至批次中最长的RIR长度 max_len = max(rir.shape[0] for rir in arrays) padded = torch.stack([torch.nn.functional.pad(rir, (0, max_len - rir.shape[0])) for rir in arrays]) sampling_rate = batch[0]["audio"]["sampling_rate"] return {"rirs": padded, "sampling_rate": sampling_rate} # 配置PyTorch数据加载器 rir_loader = DataLoader(rir_ds, batch_size=4, collate_fn=collate_fn) # 获取一批数据 batch = next(iter(rir_loader)) rirs = batch["rirs"] # 张量维度为 (批量大小, 时间采样点数) sr = batch["sampling_rate"] print(f"批次形状:{rirs.shape}, 采样率:{sr}") ## 示例:读取Treble10的6通道设备RIR并对比两路麦克风信号 python from datasets import load_dataset, Audio import matplotlib.pyplot as plt import numpy as np ds = load_dataset( "treble-technologies/Treble10-RIR", split="rir_6ch", streaming=True, ) ds = ds.cast_column("audio", Audio()) # 从TorchCodec解码器对象中读取样本: rec = next(iter(ds)) samples = rec["audio"].get_all_samples() rir_6ch = samples.data sr = samples.sample_rate print(f"6通道RIR的形状为:{rir_6ch.shape}, 采样率为 {sr} 赫兹。") # 可以通过如下方式访问并对比6通道设备的单个通道信号 rir0 = rir_6ch[0] # 麦克风0 rir1 = rir_6ch[4] # 麦克风4 t_axis = np.arange(rir0.shape[0]) / sr plt.figure() plt.plot(t_axis, rir0.numpy(), label="麦克风0") plt.plot(t_axis, rir1.numpy(), label="麦克风4") plt.xlabel("时间 (秒)") plt.ylabel("幅值") plt.legend() plt.show() ## 示例:读取Treble10的8阶环绕声(Ambisonics)RIR python from datasets import load_dataset, Audio import io, soundfile as sf # 以流式加载模式加载数据集 ds = load_dataset("treble-technologies/Treble10-RIR", split="rir_hoa8", streaming=True) # 禁用自动解码(我们将手动解码) ds = ds.cast_column("audio", Audio(decode=False)) # 从迭代器中获取一条样本 sample = next(iter(ds)) # 获取原始音频字节流 audio_bytes = sample["audio"]["bytes"] # 部分旧数据集可能不包含"bytes"字段,因此回退到从文件读取数据 if audio_bytes is None: # 直接使用Hugging Face的文件对象 with sample["audio"]["path"].open("rb") as f: audio_bytes = f.read() # 直接从内存中解码环绕声音频 rir_hoa, sr = sf.read(io.BytesIO(audio_bytes)) print(f"已加载8阶环绕声RIR:形状={rir_hoa.shape}, 采样率={sr}") ## 数据集详情 本数据集包含三个子集: - **Treble10-RIR-mono**:该子集包含单声道房间冲激响应(Room Impulse Response, RIR)。每间房间中,5个声源与多个接收器之间均可生成RIR。接收器沿水平接收网格布置,分辨率为0.5米,布置于0.5米、1.0米、1.5米三个高度处。所有声源与接收器的位置均经过有效性校验,确保其不会与房间几何结构或家具发生碰撞。 - **Treble10-RIR-hoa8**:该子集包含8阶环绕声(Ambisonics)RIR。声源与接收器的设置与`rir_mono`子集完全一致。 - **Treble10-RIR-6ch**:本子集采用6通道圆柱设备放置于`rir_mono`子集的接收器位置处。基于5个上方声源与6个设备麦克风,可获取各组合对应的RIR。换言之,`rir_mono`子集的每一组声源-接收器组合均对应一条6通道设备RIR。6通道分块的元数据中包含了麦克风的坐标信息。 所有RIR(单声道/环绕声编码/设备型)均通过Treble SDK仿真生成,关于该工具的更多细节可参见下文专属章节。本数据集采用混合仿真范式:中低频段使用基于数值波动的求解器(间断伽辽金有限元法,discontinuous Galerkin finite element method, DG-FEM),高频段则采用几何声学(Geometrical Acoustics, GA)仿真。针对**Treble10-RIR**数据集,波动求解与几何声学仿真的过渡频率设置为5 kHz。生成的混合RIR为宽带信号,采样率为32 kHz,可覆盖32 kHz信号的全频段范围,包含最高16 kHz的音频内容。 此前,来自相同房间的少量仿真数据曾作为生成式数据增强(Generative Data Augmentation, GenDA)挑战赛的一部分于ICASSP 2025会议上发布。**Treble10-RIR**数据集与GenDA数据集存在三方面核心差异: 1. **Treble10-RIR**数据集采用混合仿真范式生成宽带RIR(5 kHz以下为波动仿真,5 kHz以上为几何声学仿真),可覆盖32 kHz信号的全频段范围。而GenDA子集仅包含波动仿真部分,因此**Treble10-RIR**数据集的可用频段范围较前者提升了一倍以上。 2. **Treble10-RIR**数据集总计包含6个子集。其中3个子集包含RIR(单声道、8阶环绕声编码、6通道设备型),另外3个子集则包含相同通道格式的预卷积场景数据。而GenDA子集仅支持单声道与8阶环绕声编码RIR,且未提供预卷积场景数据。 3. **Treble10-RIR**数据集完整发布,包含约3100组声源-接收器配置。而GenDA子集仅包含约60组随机选取的声源-接收器配置,占比极小。 ## 数据集用途 远场自动语音识别(ASR)、语音增强、去混响以及声源分离等应用场景均可从**Treble10-RIR**数据集获益良多。不妨对比近场与远场ASR场景以说明这一点:在近场场景中,如智能手机或头戴式设备,麦克风靠近声源,采集到的纯净信号以直达声为主;而在远场场景中,如智能音箱或会议室设备,麦克风距离声源可达数米,采集到的信号是直达声、混响与背景噪声的复杂混合体。这种差异不仅体现在空间维度,更源于物理特性:远场条件下,声波会在墙面反射、绕射并随时间衰减,而这些过程均可通过房间冲激响应(RIR)进行表征。为实现此类场景下的鲁棒性能,ASR及相关模型必须在能够精准复现这类复杂声学交互的数据集上进行训练——而这正是**Treble10-RIR**数据集所能提供的。同理,只有在能够准确模拟复杂环境中声波传播的数据集上进行评估,才能可靠地衡量此类系统的性能。 ## 数据集结构 **Treble10-RIR**的每个子集对应不同通道配置的仿真房间冲激响应(RIR)。所有子集共享相同的元数据架构与组织方式。 |分块名称 | 描述 | 通道数 | |--------------|---------------------|----------| |`rir_mono` | 单声道RIR | 1 | |`rir_hoa8` | 8阶环绕声(Ambisonics)RIR(ACN/SN3D格式) | 81 | |`rir_6ch` | 6通道家用音频设备布局 | 6 | 该6通道设备的麦克风相对于设备中心的位置如下: |通道号|位置 [米]| |-------|--------| |0 |[0.03, 0., 0.]| |1 |[0.015, 0.026, 0.]| |2 |[-0.0145, 0.026, 0.]| |3 |[-0.03, 0., 0.]| |4 |[-0.015, -0.026, 0.]| |5 |[0.015, -0.026, 0.]| ### 文件内容 每个`.parquet`文件包含数据集一个子集(分块)的元数据。鉴于该RIR数据集可应用于多种音频机器学习任务,我们将数据的具体分割方式交由使用者自行决定。元数据将每条冲激响应与其对应的音频文件相关联,并包含详细的声学参数。 | 列名 | 描述 | |---------|-------------| | **audio** | RIR音频文件的引用指针。 | | **Filename** | WAV文件的文件名与相对路径。 | | **Room** | 房间简称(如`Room1`、`Room5`)。 | | **Room Description** | 房间类型描述(如`meeting_room`、`living_room`)。 | | **Room Volume [m³]** | 房间体积,单位为立方米。 | | **Direct Path Length [m]** | 声源与接收器之间的直达路径长度,单位为米。 | | **Source Label / Position** | 声源的标签与三维坐标。 | | **Receiver Label / Position** | 接收器的标签与三维坐标。 | | **Receiver Type** | 接收器配置类型(`mono`、`8th order`或`6-channel`)。 | | **Frequencies, EDT, T30, C50, Average Absorption** | 倍频带声学参数。 | | **Avg EDT, Avg T30, Avg Absorption** | 宽带综合统计值。 | ## 声学参数 本数据集的RIR附带若干相关声学参数,用于表征特定声源-接收器组合下的声学场特性。 ### T30:混响时间 T30用于衡量房间内声源停止发声后,声能衰减至初始值的1/1000000(即下降60分贝)所需的时间,通常通过前30分贝的衰减曲线估算得到,是表征空间混响特性的核心指标。 较短的T30对应“干”声房间,如小型办公室或录音棚(理想值低于0.2秒);较长的T30则对应“湿”声房间,如音乐厅或停车场(时长超过1.0秒)。 ### EDT:早期衰变时间 早期衰变时间是另一项混响特性指标,通过声能前10分贝的衰减曲线计算得到。EDT与心理声学上的混响感知高度相关,同时也可用于反映空间内声学场的均匀性。 若EDT与T30近似相等,则混响衰减近似为单斜率曲线;若EDT远小于T30,则说明存在双斜率声能衰减,这种情况通常出现在两个声学耦合的房间中。 ### C50:语音清晰度指数 C50是指到达时间早于50毫秒的直达声与晚于50毫秒的后期声之间的能量比值,常用于衡量房间的语音清晰度与可懂度,量化了混响对直达声的遮蔽程度。 C50为正值(大于0分贝)通常被认为是语音清晰可懂的理想状态;C50为负值(小于0分贝)则通常意味着语音清晰度较差。 ## 更多信息 关于本数据集的更多细节可参阅对应博客文章。 ## 授权信息 **Treble10-RIR**数据集采用[知识共享署名-非商业性使用-相同方式共享4.0国际许可协议](https://creativecommons.org/licenses/by-nc-sa/4.0/)进行授权。 ### 引用信息 @misc{mullins2025treble10highqualitydatasetfarfield, title={Treble10: A high-quality dataset for far-field speech recognition, dereverberation, and enhancement}, author={Sarabeth S. Mullins and Georg G"otz and Eric Bezzam and Steven Zheng and Daniel Gert Nielsen}, year={2025}, eprint={2510.23141}, archivePrefix={arXiv}, primaryClass={eess.AS}, url={https://arxiv.org/abs/2510.23141}, }
提供机构:
maas
创建时间:
2025-10-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作