VoxCeleb2
收藏OpenDataLab2026-05-17 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/VoxCeleb2
下载链接
链接失效反馈官方服务:
资源简介:
VoxCeleb2 是一个从开源媒体自动获得的大规模说话人识别数据集。 VoxCeleb2 包含来自 6k 多个扬声器的超过 100 万个话语。由于数据集是“在野外”收集的,语音片段被现实世界的噪音破坏,包括笑声、串音、频道效果、音乐和其他声音。该数据集也是多语言的,来自 145 个不同国籍的演讲者,涵盖了广泛的口音、年龄、种族和语言。该数据集是视听的,因此对于许多其他应用也很有用,例如 - 视觉语音合成、语音分离、从人脸到语音的跨模态转换(反之亦然)以及从视频中训练人脸识别以补充现有的人脸识别数据集。
VoxCeleb2 is a large-scale speaker recognition dataset automatically acquired from open-source media. It contains over one million utterances from more than 6,000 speakers. As the dataset was collected "in the wild", the speech segments are corrupted by real-world noises, including laughter, cross-talk, channel effects, music and other sounds. This dataset is also multilingual, with speakers from 145 different nationalities covering a wide range of accents, ages, ethnicities and languages. Being audiovisual, it is useful for a variety of other applications, such as visual speech synthesis, speech separation, cross-modal conversion between face and speech (and vice versa), and training face recognition models from videos to complement existing face recognition datasets.
提供机构:
OpenDataLab
创建时间:
2022-04-27
搜集汇总
数据集介绍

背景与挑战
背景概述
VoxCeleb2是一个大规模说话人识别数据集,包含来自6000多个说话者的超过100万个话语,数据从开源媒体自动收集,具有现实世界噪音和多语言多样性,覆盖145个国籍。该数据集为视听多模态数据,适用于说话人识别、视觉语音合成、语音分离和跨模态转换等多种应用场景。
以上内容由遇见数据集搜集并总结生成



