five

SoundDesc: Cleaned and Group-Filtered Splits

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/7665916
下载链接
链接失效反馈
官方服务:
资源简介:
This upload contains dataset splits of SoundDesc [1] and other supporting material for our paper: Data leakage in cross-modal retrieval training: A case study [arXiv] [ieeexplore] In our paper, we demonstrated that a data leakage problem in the previously published splits of SoundDesc leads to overly optimistic retrieval results. Using an off-the-shelf audio fingerprinting software, we identified that the data leakage stems from duplicates in the dataset. We define two new splits for the dataset: a cleaned split to remove the leakage and a group-filtered to avoid other kinds of weak contamination of the test data. SoundDesc is a dataset which was automatically sourced from the BBC Sound Effects web page [2]. The results from our paper can be reproduced using clean_split01 and group_filtered_split01. If you use the splits, please cite our work: Benno Weck, Xavier Serra, "Data Leakage in Cross-Modal Retrieval Training: A Case Study," ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5, doi: 10.1109/ICASSP49357.2023.10094617. @INPROCEEDINGS{10094617, author={Weck, Benno and Serra, Xavier}, booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, title={Data Leakage in Cross-Modal Retrieval Training: A Case Study}, year={2023}, volume={}, number={}, pages={1-5}, doi={10.1109/ICASSP49357.2023.10094617}} References: [1] A. S. Koepke, A. -M. Oncescu, J. Henriques, Z. Akata and S. Albanie, "Audio Retrieval with Natural Language Queries: A Benchmark Study," in IEEE Transactions on Multimedia, doi: 10.1109/TMM.2022.3149712. [2] https://sound-effects.bbcrewind.co.uk/
创建时间:
2023-08-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作