sin2piusc/jgca_v2_50k
收藏Hugging Face2024-07-09 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/sin2piusc/jgca_v2_50k
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含音频和句子两个主要特征,音频采样率为16000Hz,句子为字符串类型。数据集分为训练集,包含50004个样本,总大小为5640823349.174486字节。数据集主要用于翻译、文本生成和自动语音识别任务,语言为日语。数据集名称显示它是由common voice、google fleurs、JSUTv1.1和JAS_v2处理而来,专为whisper模型设计。数据集未去除特殊字符或进行标准化处理,且已被打乱和扁平化处理。
The dataset includes audio and corresponding sentence text. The audio sampling rate is 16000Hz. The dataset is divided into a training set, containing 50004 samples, with a total size of 5640823349.174486 bytes. The dataset has not been stripped of special characters or normalized, and has been shuffled and flattened. Suitable for translation, text generation, and automatic speech recognition tasks. Supports Japanese. The dataset name includes common voice, google fleurs, JSUTv1.1, JAS_v2 (joujiboi/japanese-anime-speech-v2), specifically processed for whisper.
提供机构:
sin2piusc



