five

A Large TV Dataset for Speech and Music Activity Detection

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/records/7025971
下载链接
链接失效反馈
官方服务:
资源简介:
Automatic speech and music activity detection (SMAD) is an enabling task that can help segment, index, and pre-process audio content in radio broadcast and TV programs. However, due to copyright concerns and the cost of manual annotation, the limited availability of diverse and sizeable datasets hinders the progress of state-of-the-art (SOTA) data-driven approaches. We address this challenge by presenting a large-scale dataset containing Mel spectrogram, VGGish, and MFCCs features extracted from around 1600 hours of professionally produced audio tracks and their corresponding noisy labels indicating the approximate location of speech and music segments. The labels are derived from several sources such as subtitles. A test set curated by human annotators is also included as a subset for evaluation. To the best of our knowledge, this dataset is the first large-scale, open-sourced dataset that contains features extracted from professionally produced audio tracks and their corresponding frame-level speech and music annotations.
创建时间:
2022-09-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作