SiVi-CAFE dataset - Sighted and Visually-impaired Captions for Audio in Finnish and English
收藏Mendeley Data2024-06-20 更新2024-06-28 收录
下载链接:
https://zenodo.org/records/11505823
下载链接
链接失效反馈官方服务:
资源简介:
This is a dataset containing audio captions for audio files of the TAU Urban Acoustic Scenes 2019 development dataset (airport, public square, and park) for 10 cities. The files were annotated using a web-based tool as presented in: Martin Morato, I., & Mesaros, A. (2021). Diversity and bias in audio captioning datasets. In F. Font, A. Mesaros, D. P.W. Ellis, E. Fonseca, M. Fuentes, & B. Elizalde (Eds.), Proceedings of the 6th Workshop on Detection and Classication of Acoustic Scenes and Events (DCASE 2021) (pp. 90-94) Each file is annotated by multiple annotators that provided a one-sentence description of the audio content. Data is provided in csv files: sighted-EN-bias-original sighted-FI-bias-translated sighted-EN-no_bias-original sighted-FI-no_bias-translated visually_impaired-FI-original visually_impaired-EN-translated sighted-FI-original sighted-EN-translated original = original descriptions, non-translated translated = Translated descriptions using automatic deep learning tool 900 annotated audio files, Finnish audio descriptions provided by visual-impaired and sighted people. 2050 annotated audio files, English audio descriptions provided by international students (not-necessarily English native-speakers). 3930 annotated audio files, English audio descriptions provided by international students (not-necessarily English native-speakers) biased by the provided audio tags. The audio files can be downloaded from https://zenodo.org/record/2589280 and are covered by their own license.
本数据集针对覆盖10座城市的TAU城市声学场景2019开发数据集(涵盖机场、公共广场与公园三类场景)的音频文件,提供音频字幕标注。标注工作通过基于网页的工具完成,相关研究参见:Martin Morato, I., & Mesaros, A. (2021). 音频字幕数据集的多样性与偏差[载于F. Font、A. Mesaros、D. P.W. Ellis、E. Fonseca、M. Fuentes及B. Elizalde主编:第六届声学场景与事件检测与分类研讨会(DCASE 2021)论文集,第90-94页]。每个音频文件均由多名标注者完成标注,每位标注者为对应音频内容提供一句描述性文本。数据集以CSV文件形式存储,具体包含以下文件:sighted-EN-bias-original、sighted-FI-bias-translated、sighted-EN-no_bias-original、sighted-FI-no_bias-translated、visually_impaired-FI-original、visually_impaired-EN-translated、sighted-FI-original、sighted-EN-translated。其中"original"代表未翻译的原始描述文本,"translated"代表通过自动化深度学习工具生成的翻译描述文本。数据集包含三类带标注的音频文件:1. 900个带标注的音频文件,其芬兰语音频字幕由视障人士与视力正常者提供;2. 2050个带标注的音频文件,其英语音频字幕由国际学生(未必以英语为母语)提供;3. 3930个带标注的音频文件,其英语音频字幕由受给定音频标签影响存在标注偏差的国际学生(未必以英语为母语)提供。音频文件可从https://zenodo.org/record/2589280下载,且受其自身版权许可证约束。
创建时间:
2024-06-19



