sin2piusc/jgca_v2_50k_2

Name: sin2piusc/jgca_v2_50k_2
Creator: sin2piusc
Published: 2024-07-09 18:09:22
License: 暂无描述

Hugging Face2024-07-09 更新2024-07-22 收录

下载链接：

https://hf-mirror.com/datasets/sin2piusc/jgca_v2_50k_2

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含音频和句子两个特征，音频采样率为16000Hz，句子为字符串类型。数据集分为训练集，包含49504个样本，总大小为12264199958.656字节。数据集主要用于自动语音识别、翻译和文本到语音转换任务，语言为日语。数据集来源于Common Voice、Google FLEURS、JSUTv1.1和JAS_v2（joujiboi/japanese-anime-speech-v2），其中50%为动漫语音，50%为其他语料。数据集未经过洗牌或标准化处理。

The dataset contains two features: audio and sentence, with audio sampled at 16000Hz and sentences as string type. The dataset is divided into a training set containing 49504 samples, with a total size of 12264199958.656 bytes. The dataset is primarily used for automatic speech recognition, translation, and text-to-speech tasks, in Japanese. The dataset is sourced from Common Voice, Google FLEURS, JSUTv1.1, and JAS_v2 (joujiboi/japanese-anime-speech-v2), with 50% anime speech and 50% other corpora. The dataset has not been shuffled or normalized.

提供机构：

sin2piusc

原始信息汇总

数据集概述

数据集信息

特征:
- audio:
  - 采样率: 16000
- sentence:
  - 数据类型: string
分割:
- train:
  - 字节数: 12264199958.656
  - 样本数: 49504
下载大小: 11879936920
数据集大小: 12264199958.656

配置

配置名称: default
- 数据文件:
  - train: data/train-*

许可证

apache-2.0

任务类别

自动语音识别
翻译
文本到语音

语言

日语

数据集大小类别

10K<n<100K

数据集来源

common voice
google fleurs
JSUTv1.1
JAS_v2 (joujiboi/japanese-anime-speech-v2)

数据处理

未打乱或归一化
50% 动画语音，50% 其他
其他语料库完全代表

5,000+

优质数据集

54 个

任务类型

进入经典数据集