five

DCASE 2024 Task 9: Language-Queried Audio Source Separation | Validation Set

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10886480
下载链接
链接失效反馈
官方服务:
资源简介:
This is the validation set for Task 9, Language-Queried Audio Source Separation (LASS), in DCASE 2024 Challenge.  This validation split is meant to be used for Task 9 at the scientific challenge DCASE 2024. This split is not meant to be used for training LASS methods. This split is meant to be used for evaluating LASS methods during the model development stage. This validation set consists of 1000 audio files sourced from Freesound [1], uploaded between April and October 2023. Each audio file has been manually annotated with three captions. In the annotation guidance, we instructed annotators to describe the content of audio clips using 5-20 words (similar to the caption style in Clotho [3] and AudioCaps [4] datasets). The tags of each audio file were verified and revised according to the FSD50K [2] sound event categories. Each audio file has been chunked into a 10-second clip and downsampled to 16kHz. == Details == The audio files in the archives: lass_validation.zip and the associated metadata (including tags and captions) in the JSON file: lass_validation.json Participants will evaluate their LASS models using synthetic mixture data in the development stage. Specifically, given an audio clip A1 and its corresponding caption C, we select an additional audio clip, A2, to serve as background noise, thereby creating a mixed audio, A3. We anticipate that the LASS system, given A3 and C as inputs, will be able to separate the A1 source. We use the revised tags information to ensure that the two audio clips used in each mix do not share overlapping sound source classes. Three thousand synthetic audio mixtures with signal-to-noise ratios (SNR) ranging from -15dB to 15dB will be generated for the validation of LASS model development. These synthetic mixtures can be generated based on the provided CSV file: lass_synthetic_validation.csv The evaluation tool can be found at: https://github.com/Audio-AGI/dcase2024_task9_baseline/blob/main/dcase_evaluator.py == References == [1] Fonseca E, Pons Puig J, Favory X, et al. Freesound datasets: a platform for the creation of open audio datasets. International Society for Music Information Retrieval (ISMIR), 2017. [2] Fonseca E, Favory X, Pons J, et al. FSD50k: an open dataset of human-labeled sound events. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 30: 829-852. [3] Drossos K, Lipping S, Virtanen T. Clotho: An audio captioning dataset. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2020: 736-740. [4] Kim C D, Kim B, Lee H, et al. AudioCaps: Generating captions for audios in the wild. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 2019: 119-132.
创建时间:
2024-03-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作