"IsoNet Dataset "

Name: "IsoNet Dataset "
Creator: IEEE DataPort
Published: 2026-01-25 13:07:31
License: 暂无描述

DataCite Commons2026-01-25 更新2026-05-03 收录

下载链接：

https://ieee-dataport.org/documents/isonet-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

"We present IsoNet, a large-scale multimodal dataset for audio-visual speaker isolation in reverberant environments using microphone arrays. The dataset comprises approximately 25,000 samples, each containing synchronized 4-channel spatial audio recordings and corresponding visual face localization data derived from the VoxCeleb2 corpus. Each sample includes mixed multi-speaker audio captured by a simulated square microphone array (7-10 cm spacing), clean target speech references, per-frame face bounding boxes, and comprehensive metadata including ground-truth Direction of Arrival (DOA), Signal-to-Noise Ratio (SNR), and room reverberation time (RT60). To support curriculum learning strategies, we provide three difficulty tiers with progressively challenging acoustic conditions: a base set with SNR ranging from 5-20 dB, and two curriculum levels spanning 1-10 dB and -1 to 10 dB respectively. All acoustic scenes are generated using physically-accurate room impulse response simulation via pyroomacoustics, with variable room dimensions and reverberation characteristics (RT60: 0.2-0.8s). The dataset is designed to advance research in audio-visual speech separation, beamforming algorithms, active speaker detection, and multimodal sensor fusion for robust speech processing in challenging real-world conditions. IsoNet provides researchers with a standardized benchmark for evaluating and comparing speaker isolation systems that leverage both spatial audio cues and visual information. "

提供机构：

IEEE DataPort

创建时间：

2026-01-25

5,000+

优质数据集

54 个

任务类型

进入经典数据集