sin2piusc/jgca_v2_50k_2
收藏Hugging Face2024-07-09 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/sin2piusc/jgca_v2_50k_2
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含音频和句子两个特征,音频采样率为16000Hz,句子为字符串类型。数据集分为训练集,包含49504个样本,总大小为12264199958.656字节。数据集主要用于自动语音识别、翻译和文本到语音转换任务,语言为日语。数据集来源于Common Voice、Google FLEURS、JSUTv1.1和JAS_v2(joujiboi/japanese-anime-speech-v2),其中50%为动漫语音,50%为其他语料。数据集未经过洗牌或标准化处理。
The dataset contains two features: audio and sentence, with audio sampled at 16000Hz and sentences as string type. The dataset is divided into a training set containing 49504 samples, with a total size of 12264199958.656 bytes. The dataset is primarily used for automatic speech recognition, translation, and text-to-speech tasks, in Japanese. The dataset is sourced from Common Voice, Google FLEURS, JSUTv1.1, and JAS_v2 (joujiboi/japanese-anime-speech-v2), with 50% anime speech and 50% other corpora. The dataset has not been shuffled or normalized.
提供机构:
sin2piusc
原始信息汇总
数据集概述
数据集信息
- 特征:
audio:- 采样率: 16000
sentence:- 数据类型: string
- 分割:
train:- 字节数: 12264199958.656
- 样本数: 49504
- 下载大小: 11879936920
- 数据集大小: 12264199958.656
配置
- 配置名称: default
- 数据文件:
train: data/train-*
- 数据文件:
许可证
- apache-2.0
任务类别
- 自动语音识别
- 翻译
- 文本到语音
语言
- 日语
数据集大小类别
- 10K<n<100K
数据集来源
- common voice
- google fleurs
- JSUTv1.1
- JAS_v2 (joujiboi/japanese-anime-speech-v2)
数据处理
- 未打乱或归一化
- 50% 动画语音,50% 其他
- 其他语料库完全代表



