BrunoHays/DVOICEv2.0-Darija
收藏Hugging Face2024-12-18 更新2024-12-21 收录
下载链接:
https://hf-mirror.com/datasets/BrunoHays/DVOICEv2.0-Darija
下载链接
链接失效反馈官方服务:
资源简介:
DVoice项目是一个社区倡议,旨在为非洲语言和方言提供数据和模型,以促进它们在语音技术中的使用。由于这些语言的数据缺乏,项目采用了两种方法来收集数据:一是基于Mozilla Common Voice的DVoice平台,用于从社区收集真实的录音;二是使用迁移学习技术自动标注录音。当前,DVoice平台管理着包括摩洛哥阿拉伯语方言Darija在内的7种语言的数据集。此外,该版本的数据集还包含了通过Voxlingua107数据集的迁移学习自动标注的斯瓦希里语数据。由于当前数据集规模较小,项目还提倡增加数据量,因此该版本的数据集包含了易于识别的增强数据。
DVoice is a community initiative aimed at providing African languages and dialects with data and models to facilitate their use of voice technologies. The dataset includes 7 languages such as Darija (Moroccan Arabic dialect), Wolof, Mandingo, Serere, Pular, Diola, and Soninke. Data collection methods include the DVoice platform based on Mozilla Common Voice and automatic labeling of recordings through transfer learning techniques. Additionally, the dataset includes Swahili-labeled data obtained through automatic labeling via the transfer learning of the Voxlingua107 dataset. The dataset version contains easily identifiable augmented data and calls for an increase in data volume to address the current small size of the dataset.
提供机构:
BrunoHays



