Casual Conversations v2 Dataset
收藏arXiv2023-03-09 更新2024-06-21 收录
下载链接:
https://ai.facebook.com/datasets/casual-conversations-v2-dataset
下载链接
链接失效反馈官方服务:
资源简介:
Casual Conversations v2数据集是由Meta AI创建的一个大型、多样化的音频/视觉/语音数据集,旨在评估算法偏见和模型鲁棒性。该数据集包含26,467个视频,涉及5,567名独特的付费参与者,这些参与者来自巴西、印度、印度尼西亚、墨西哥、越南、菲律宾和美国,具有多样化的地理和人口特征。参与者提供了年龄、性别、语言/方言、残疾状况、身体装饰和地理位置等自我报告信息,而训练有素的注释者则标注了皮肤色调和声音音色等属性。此数据集不仅用于测量公平性,还用于评估模型的鲁棒性,适用于从计算机视觉和音频/语音识别到深度伪造检测等多种AI任务。
Casual Conversations v2 Dataset is a large, diverse audio/visual/speech dataset created by Meta AI, designed to evaluate algorithmic bias and model robustness. This dataset contains 26,467 videos involving 5,567 unique paid participants from Brazil, India, Indonesia, Mexico, Vietnam, the Philippines and the United States, with diverse geographic and demographic characteristics. Participants provided self-reported information including age, gender, language/dialect, disability status, body adornments and geographic location, while professionally trained annotators labeled attributes such as skin tone and voice timbre. This dataset can be used not only for measuring fairness but also for evaluating model robustness, and is applicable to a wide range of AI tasks spanning from computer vision and audio/speech recognition to deepfake detection.
提供机构:
Meta AI
创建时间:
2023-03-09



