SiVi-CAFE dataset - Sighted and Visually-impaired Captions for Audio in Finnish and English

Mendeley Data2024-06-20 更新2024-06-28 收录

下载链接：

https://zenodo.org/records/11505823

下载链接

链接失效反馈

官方服务：

资源简介：

This is a dataset containing audio captions for audio files of the TAU Urban Acoustic Scenes 2019 development dataset (airport, public square, and park) for 10 cities. The files were annotated using a web-based tool as presented in: Martin Morato, I., & Mesaros, A. (2021). Diversity and bias in audio captioning datasets. In F. Font, A. Mesaros, D. P.W. Ellis, E. Fonseca, M. Fuentes, & B. Elizalde (Eds.), Proceedings of the 6th Workshop on Detection and Classication of Acoustic Scenes and Events (DCASE 2021) (pp. 90-94) Each file is annotated by multiple annotators that provided a one-sentence description of the audio content. Data is provided in csv files: sighted-EN-bias-original sighted-FI-bias-translated sighted-EN-no_bias-original sighted-FI-no_bias-translated visually_impaired-FI-original visually_impaired-EN-translated sighted-FI-original sighted-EN-translated original = original descriptions, non-translated translated = Translated descriptions using automatic deep learning tool 900 annotated audio files, Finnish audio descriptions provided by visual-impaired and sighted people. 2050 annotated audio files, English audio descriptions provided by international students (not-necessarily English native-speakers). 3930 annotated audio files, English audio descriptions provided by international students (not-necessarily English native-speakers) biased by the provided audio tags. The audio files can be downloaded from https://zenodo.org/record/2589280 and are covered by their own license.

本数据集针对覆盖10座城市的TAU城市声学场景2019开发数据集（涵盖机场、公共广场与公园三类场景）的音频文件，提供音频字幕标注。标注工作通过基于网页的工具完成，相关研究参见：Martin Morato, I., & Mesaros, A. (2021). 音频字幕数据集的多样性与偏差[载于F. Font、A. Mesaros、D. P.W. Ellis、E. Fonseca、M. Fuentes及B. Elizalde主编：第六届声学场景与事件检测与分类研讨会（DCASE 2021）论文集，第90-94页]。每个音频文件均由多名标注者完成标注，每位标注者为对应音频内容提供一句描述性文本。数据集以CSV文件形式存储，具体包含以下文件：sighted-EN-bias-original、sighted-FI-bias-translated、sighted-EN-no_bias-original、sighted-FI-no_bias-translated、visually_impaired-FI-original、visually_impaired-EN-translated、sighted-FI-original、sighted-EN-translated。其中"original"代表未翻译的原始描述文本，"translated"代表通过自动化深度学习工具生成的翻译描述文本。数据集包含三类带标注的音频文件：1. 900个带标注的音频文件，其芬兰语音频字幕由视障人士与视力正常者提供；2. 2050个带标注的音频文件，其英语音频字幕由国际学生（未必以英语为母语）提供；3. 3930个带标注的音频文件，其英语音频字幕由受给定音频标签影响存在标注偏差的国际学生（未必以英语为母语）提供。音频文件可从https://zenodo.org/record/2589280下载，且受其自身版权许可证约束。

创建时间：

2024-06-19