AVSpeech- 视听语音数据集

超神经2024-05-27 更新2024-05-15 收录

下载链接：

https://hyper.ai/cn/datasets/8754

下载链接

链接失效反馈

官方服务：

资源简介：

AVSpeech 是一个新的、大规模的视听数据集，包括没有干扰的背景噪音的语音视频片段。这些片段的长度为 3-10 秒，在每个片段中，原声带中的听到的声音，属于视频中可看见的唯一在说话的人。

AVSpeech is a novel, large-scale audio-visual dataset consisting of speech video clips free of background interfering noise. Each clip has a duration ranging from 3 to 10 seconds, and the audio heard in the soundtrack of every clip is exclusively produced by the only speaking person visible in the corresponding video.

创建时间：

2019-09-11

搜集汇总

数据集介绍

背景与挑战

背景概述

AVSpeech是一个大规模的视听语音数据集，包含约4700小时的视频片段，源自29万个YouTube视频，覆盖多种人、语言和面部姿势。这些片段长度为3-10秒，特点是背景噪音低，语音与视频中可见的说话者严格对应，适合用于自然语言处理相关研究。

以上内容由遇见数据集搜集并总结生成