L3DAS21

Name: L3DAS21
Creator: OpenDataLab
Published: 2026-05-17 06:30:10
License: 暂无描述

OpenDataLab2026-05-17 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/L3DAS21

下载链接

链接失效反馈

官方服务：

资源简介：

L3DAS21 是一个用于 3D 音频信号处理的数据集。它由一个 65 小时的 3D 音频语料库组成，并附带一个 Python API，便于数据使用和结果提交阶段。 LEDAS21 数据集包含多源和多视角的 B 格式 Ambisonics 音频录音。作者对一间大型办公室的声场进行了采样，将两个一阶 Ambisonics 麦克风放置在房间中央，并移动扬声器在 252 个固定空间位置再现分析信号。依靠收集到的 Ambisonics 脉冲响应 (IR)，作者增强了现有的干净单声道数据集，通过将原始声音与我们的 IR 进行卷积来获得合成的三维声源。数据集分为两个主要部分，分别专用于挑战任务。第一部分针对 3D 语音增强进行了优化，包含 30000 多个虚拟 3D 音频环境，持续时间长达 10 秒。在每个样本中，说话的声音总是与其他类似办公室的背景噪音一起出现。作为本节的目标数据，作者提供了干净的单声道语音信号。相反，其他部分专门用于 3D 声音事件定位和检测任务，包含 900 个 60 秒长的音频文件。每个数据点都包含一个模拟的 3D 办公室音频环境，其中最多可以同时激活 3 个同时发生的声学事件。在本节中，样本不会强制包含口语。作为本节的目标数据，作者提供了数据点中存在的每个单独声音事件的起始和偏移时间戳、类型和空间坐标的列表。

L3DAS21 is a dataset dedicated to 3D audio signal processing. It comprises a 65-hour 3D audio corpus, and is bundled with a Python API that simplifies data utilization and streamlines the result submission process. The L3DAS21 dataset contains multi-source and multi-view B-format Ambisonics audio recordings. The authors sampled the sound field within a large office space, placed two first-order Ambisonics microphones at the center of the room, and moved speakers to 252 fixed spatial positions to reproduce the analyzed acoustic signals. Leveraging the collected Ambisonics impulse responses (IRs), the authors augmented existing clean monophonic datasets by convolving original audio sources with these IRs to generate synthetic 3D sound sources. The dataset is split into two primary sections, each tailored for a distinct challenge task. The first section is optimized for 3D speech enhancement, and includes over 30,000 virtual 3D audio environments, each with a maximum duration of 10 seconds. In every sample, speech is always paired with additional office-like background noise. As the target data for this section, the authors provide clean monophonic speech signals. Conversely, the second section is specialized for 3D sound event localization and detection tasks, and contains 900 audio files each with a length of 60 seconds. Each data point contains a simulated 3D office audio environment, where up to 3 simultaneous acoustic events may occur at any given time. In this section, speech is not a mandatory component of the samples. As the target data for this section, the authors provide a list including the start and offset timestamps, event type, and spatial coordinates for every individual sound event present in the data point.

提供机构：

OpenDataLab

创建时间：

2022-05-23

搜集汇总

数据集介绍