mesolitica/pseudolabel-malaya-speech-stt-train-whisper-large-v3-timestamp

Name: mesolitica/pseudolabel-malaya-speech-stt-train-whisper-large-v3-timestamp
Creator: mesolitica
Published: 2025-07-04 05:50:28
License: 暂无描述

Hugging Face2025-07-04 更新2025-07-05 收录

下载链接：

https://hf-mirror.com/datasets/mesolitica/pseudolabel-malaya-speech-stt-train-whisper-large-v3-timestamp

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个包含马来语音音频的数据集，音频被每30秒分割并使用Whisper Large V3模型进行了伪标签标注。数据集包含两个特征：文本(new_text)和音频文件名(audio_filename)，并且提供了训练集(train)。

This is a dataset of Malaya speech audio, which is split every 30 seconds and pseudolabeled using the Whisper Large V3 model. The dataset includes two features: text (new_text) and audio filename (audio_filename), and provides a training set (train).

提供机构：

mesolitica

5,000+

优质数据集

54 个

任务类型

进入经典数据集