hirundo-io/MASC

Name: hirundo-io/MASC
Creator: hirundo-io
Published: 2025-06-17 11:48:17
License: 暂无描述

Hugging Face2025-06-17 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/hirundo-io/MASC

下载链接

链接失效反馈

官方服务：

资源简介：

MASC数据集是一个包含1000小时16kHz采样的阿拉伯语音的数据集，该数据集从700多个YouTube频道爬取而来，具有多地区、多语种、多方言的特点，旨在推动阿拉伯语音技术的研究与开发，特别是阿拉伯语音识别。除了MASC，还开发了一个预训练的3-gram语言模型和一个预训练的自动语音识别模型，供有兴趣的研究者使用。为了增强语言模型，还创建并发布了一个包含1200万独特阿拉伯单词的数据集，这些单词最初从Twitter爬取。

The MASC dataset is a corpus consisting of 1,000 hours of 16kHz sampled Arabic speech, crawled from over 700 YouTube channels, characterized by being multi-regional, multi-genre, and multi-dialect. It is intended to advance the research and development of Arabic speech technology, with a focus on Arabic speech recognition. In addition to MASC, a pre-trained 3-gram language model and a pre-trained automatic speech recognition model have been developed and made available to interested researchers. To enhance the language model, a new dataset containing 12 million unique Arabic words, originally crawled from Twitter, has also been created and released.

提供机构：

hirundo-io

5,000+

优质数据集

54 个

任务类型

进入经典数据集