"IsoNet Dataset "
收藏DataCite Commons2026-01-25 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/isonet-dataset
下载链接
链接失效反馈官方服务:
资源简介:
"We present IsoNet, a large-scale multimodal dataset for audio-visual speaker isolation in reverberant environments using microphone arrays. The dataset comprises approximately 25,000 samples, each containing synchronized 4-channel spatial audio recordings and corresponding visual face localization data derived from the VoxCeleb2 corpus. Each sample includes mixed multi-speaker audio captured by a simulated square microphone array (7-10 cm spacing), clean target speech references, per-frame face bounding boxes, and comprehensive metadata including ground-truth Direction of Arrival (DOA), Signal-to-Noise Ratio (SNR), and room reverberation time (RT60). To support curriculum learning strategies, we provide three difficulty tiers with progressively challenging acoustic conditions: a base set with SNR ranging from 5-20 dB, and two curriculum levels spanning 1-10 dB and -1 to 10 dB respectively. All acoustic scenes are generated using physically-accurate room impulse response simulation via pyroomacoustics, with variable room dimensions and reverberation characteristics (RT60: 0.2-0.8s). The dataset is designed to advance research in audio-visual speech separation, beamforming algorithms, active speaker detection, and multimodal sensor fusion for robust speech processing in challenging real-world conditions. IsoNet provides researchers with a standardized benchmark for evaluating and comparing speaker isolation systems that leverage both spatial audio cues and visual information. "
提供机构:
IEEE DataPort
创建时间:
2026-01-25



