five

"IsoNet Dataset "

收藏
DataCite Commons2026-01-25 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/isonet-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
"We present IsoNet, a large-scale multimodal dataset for audio-visual speaker isolation in reverberant environments using microphone arrays. The dataset comprises approximately 25,000 samples, each containing synchronized 4-channel spatial audio recordings and corresponding visual face localization data derived from the VoxCeleb2 corpus. Each sample includes mixed multi-speaker audio captured by a simulated square microphone array (7-10 cm spacing), clean target speech references, per-frame face bounding boxes, and comprehensive metadata including ground-truth Direction of Arrival (DOA), Signal-to-Noise Ratio (SNR), and room reverberation time (RT60). To support curriculum learning strategies, we provide three difficulty tiers with progressively challenging acoustic conditions: a base set with SNR ranging from 5-20 dB, and two curriculum levels spanning 1-10 dB and -1 to 10 dB respectively. All acoustic scenes are generated using physically-accurate room impulse response simulation via pyroomacoustics, with variable room dimensions and reverberation characteristics (RT60: 0.2-0.8s). The dataset is designed to advance research in audio-visual speech separation, beamforming algorithms, active speaker detection, and multimodal sensor fusion for robust speech processing in challenging real-world conditions. IsoNet provides researchers with a standardized benchmark for evaluating and comparing speaker isolation systems that leverage both spatial audio cues and visual information. "
提供机构:
IEEE DataPort
创建时间:
2026-01-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作