DnR-nonverbal
收藏arXiv2025-06-10 更新2025-11-14 收录
下载链接:
https://github.com/LYCorporation/DnR-nonverbal
下载链接
链接失效反馈官方服务:
资源简介:
DnR-nonverbal是由LY株式会社开发的电影音频源分离专项数据集,聚焦非语言声音的建模挑战。该数据集包含约150条60秒长度的多轨音频,语音干融合阅读式语音与笑声、尖叫等非语言声音,音乐与效果干沿用DnR-v2的FSD50K和FMA资源,总时长约2.5小时。其构建采用零截断泊松分布控制语音片段数量,通过基于规则与大语言模型的过滤机制从FSD50K和FreeSound平台精选素材,确保声音质量的纯净性与多样性。本数据集旨在解决传统模型对情感化语音的错误分离问题,推动电影音频修复、内容分析与版权检测等实际应用的发展。
DnR-nonverbal is a specialized dataset for movie audio source separation developed by LY Co., Ltd., focusing on the modeling challenges of non-verbal sounds. This dataset contains approximately 150 multi-track audio clips each with a length of 60 seconds. The speech stems combine read speech with non-verbal sounds such as laughter and screams, while the music and sound effect stems adopt the FSD50K and FMA resources used in DnR-v2, with a total duration of around 2.5 hours. During its construction, zero-truncated Poisson distribution is employed to control the number of speech segments, and a filtering mechanism integrating rule-based methods and large language models (LLMs) is utilized to carefully select audio materials from the FSD50K and FreeSound platforms, ensuring the purity and diversity of audio quality. This dataset is designed to solve the incorrect separation issues of emotional speech by traditional models, and promote the development of practical applications such as movie audio restoration, content analysis and copyright detection.
提供机构:
LY株式会社
创建时间:
2025-06-03



