Codec-SUPERB/speech_tokenizer_16k
收藏Hugging Face2023-12-07 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Codec-SUPERB/speech_tokenizer_16k
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: test.other
path: data/test.other-*
- split: validation.other
path: data/validation.other-*
- split: train.other.500
path: data/train.other.500-*
- split: train.clean.100
path: data/train.clean.100-*
- split: test.clean
path: data/test.clean-*
- split: train.clean.360
path: data/train.clean.360-*
- split: validation.clean
path: data/validation.clean-*
dataset_info:
features:
- name: text
dtype: string
- name: id
dtype: string
- name: audio_codes
sequence:
sequence: int64
splits:
- name: test.other
num_bytes: 62049899
num_examples: 2939
- name: validation.other
num_bytes: 59498714
num_examples: 2864
- name: train.other.500
num_bytes: 5761561617
num_examples: 148688
- name: train.clean.100
num_bytes: 1166450829
num_examples: 28539
- name: test.clean
num_bytes: 62745230
num_examples: 2620
- name: train.clean.360
num_bytes: 4216515060
num_examples: 104014
- name: validation.clean
num_bytes: 62578176
num_examples: 2703
download_size: 1801683161
dataset_size: 11391399525
---
# Dataset Card for "speech_tokenizer_16k"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
Codec-SUPERB
原始信息汇总
数据集概述
配置信息
- 默认配置:
- 数据文件路径:
test.other:data/test.other-*validation.other:data/validation.other-*train.other.500:data/train.other.500-*train.clean.100:data/train.clean.100-*test.clean:data/test.clean-*train.clean.360:data/train.clean.360-*validation.clean:data/validation.clean-*
- 数据文件路径:
数据集信息
-
特征:
text: 字符串类型id: 字符串类型audio_codes: 整数序列类型
-
分割:
test.other:- 字节数: 62049899
- 样本数: 2939
validation.other:- 字节数: 59498714
- 样本数: 2864
train.other.500:- 字节数: 5761561617
- 样本数: 148688
train.clean.100:- 字节数: 1166450829
- 样本数: 28539
test.clean:- 字节数: 62745230
- 样本数: 2620
train.clean.360:- 字节数: 4216515060
- 样本数: 104014
validation.clean:- 字节数: 62578176
- 样本数: 2703
-
数据集大小:
- 下载大小: 1801683161 字节
- 数据集大小: 11391399525 字节



