five

BinMov2023: Binaural Dataset for Source Position Estimation with Head Rotation and Moving Listeners

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/7689062
下载链接
链接失效反馈
官方服务:
资源简介:
DESCRIPTION BinMov2023: Binaural Dataset for Source Position Estimation with Head Rotation and Moving Listeners is a binaural dataset containing synthetic data of single source speech signals reverberated with simulated room impulse responses. The data allows for experiments related to audio tasks of sound source localization and sound distance estimation. The dataset consists of three subsets, related to three different scenarios: - static: a static sound source and a static listener- rotation: a static sound source and a static listener with a head rotating in the azimuth plane- walking: a static sound source and a listener moving in space Each sound file contains a unique combination of a simulated room and source and receiver positions. The walking scenario contains simulations of 2500 different rooms, whereas the static and rotation scenarios contain 5000 rooms.    REPORT AND REFERENCE A detailed description of the dataset and the data generation process can be found in: D. A. Krause, G. García-Barrios, A. Politis and A. Mesaros, "Binaural Sound Source Distance Estimation and Localization for a Moving Listener," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, doi: 10.1109/TASLP.2023.3346297. available here. The supplementary material describing the data simulation is available under this link. If you use the dataset, please consider citing the abovementioned paper.    METADATA Each sound file has a separate metadata file assigned. The information in the metadata comes per frame in the following format: [nb_frame (int)], [x (float)], [y (float)], [z (float)], [a (float)], [b (float)], [c (float)], [d (float)] Where nb_frame is the number of frame, {x, y, z} are the unnormalized Cartesian coordinates of the sound source and {a, b, c, d} are the quaternion values related to the rotation of the listener's head.   LICENSE The database is published under a custom **open non-commercial with attribution** license. It can be found in the `LICENSE.txt` file that accompanies the data.
创建时间:
2024-01-02
二维码
社区交流群
二维码
科研交流群
商业服务