jq/salt-asr-data-transcriptions
收藏Hugging Face2024-05-03 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/jq/salt-asr-data-transcriptions
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: multispeaker-ach
features:
- name: id
dtype: int64
- name: text
dtype: string
- name: audio_language
dtype: string
- name: is_studio
dtype: bool
- name: speaker_id
dtype: string
- name: sample_rate
dtype: int64
- name: transcription
dtype: string
- name: edit_distance
dtype: int64
splits:
- name: train
num_bytes: 704574
num_examples: 4811
- name: dev
num_bytes: 14750
num_examples: 101
- name: test
num_bytes: 14497
num_examples: 96
download_size: 401928
dataset_size: 733821
- config_name: multispeaker-eng
features:
- name: id
dtype: int64
- name: text
dtype: string
- name: audio_language
dtype: string
- name: is_studio
dtype: bool
- name: speaker_id
dtype: string
- name: sample_rate
dtype: int64
- name: transcription
dtype: string
- name: edit_distance
dtype: int64
splits:
- name: dev
num_bytes: 15282
num_examples: 100
- name: test
num_bytes: 15194
num_examples: 96
- name: train
num_bytes: 734854
num_examples: 4797
download_size: 402022
dataset_size: 765330
- config_name: multispeaker-lgg
features:
- name: id
dtype: int64
- name: text
dtype: string
- name: audio_language
dtype: string
- name: is_studio
dtype: bool
- name: speaker_id
dtype: string
- name: sample_rate
dtype: int64
- name: transcription
dtype: string
- name: edit_distance
dtype: int64
splits:
- name: train
num_bytes: 704330
num_examples: 4811
- name: dev
num_bytes: 14684
num_examples: 101
- name: test
num_bytes: 14411
num_examples: 96
download_size: 406173
dataset_size: 733425
- config_name: multispeaker-lug
features:
- name: id
dtype: int64
- name: text
dtype: string
- name: audio_language
dtype: string
- name: is_studio
dtype: bool
- name: speaker_id
dtype: string
- name: sample_rate
dtype: int64
- name: transcription
dtype: string
- name: edit_distance
dtype: int64
splits:
- name: train
num_bytes: 801734
num_examples: 5016
- name: dev
num_bytes: 16421
num_examples: 103
- name: test
num_bytes: 16270
num_examples: 97
download_size: 819770
dataset_size: 834425
- config_name: multispeaker-nyn
features:
- name: id
dtype: int64
- name: text
dtype: string
- name: audio_language
dtype: string
- name: is_studio
dtype: bool
- name: speaker_id
dtype: string
- name: sample_rate
dtype: int64
- name: transcription
dtype: string
- name: edit_distance
dtype: int64
splits:
- name: train
num_bytes: 700078
num_examples: 4811
- name: dev
num_bytes: 14574
num_examples: 101
- name: test
num_bytes: 14351
num_examples: 96
download_size: 417568
dataset_size: 729003
- config_name: multispeaker-teo
features:
- name: id
dtype: int64
- name: text
dtype: string
- name: audio_language
dtype: string
- name: is_studio
dtype: bool
- name: speaker_id
dtype: string
- name: sample_rate
dtype: int64
- name: transcription
dtype: string
- name: edit_distance
dtype: int64
splits:
- name: train
num_bytes: 690253
num_examples: 4811
- name: dev
num_bytes: 14389
num_examples: 101
- name: test
num_bytes: 14113
num_examples: 96
download_size: 401353
dataset_size: 718755
configs:
- config_name: multispeaker-ach
data_files:
- split: train
path: multispeaker-ach/train-*
- split: dev
path: multispeaker-ach/dev-*
- split: test
path: multispeaker-ach/test-*
- config_name: multispeaker-eng
data_files:
- split: dev
path: multispeaker-eng/dev-*
- split: test
path: multispeaker-eng/test-*
- split: train
path: multispeaker-eng/train-*
- config_name: multispeaker-lgg
data_files:
- split: train
path: multispeaker-lgg/train-*
- split: dev
path: multispeaker-lgg/dev-*
- split: test
path: multispeaker-lgg/test-*
- config_name: multispeaker-lug
data_files:
- split: train
path: multispeaker-lug/train-*
- split: dev
path: multispeaker-lug/dev-*
- split: test
path: multispeaker-lug/test-*
- config_name: multispeaker-nyn
data_files:
- split: train
path: multispeaker-nyn/train-*
- split: dev
path: multispeaker-nyn/dev-*
- split: test
path: multispeaker-nyn/test-*
- config_name: multispeaker-teo
data_files:
- split: train
path: multispeaker-teo/train-*
- split: dev
path: multispeaker-teo/dev-*
- split: test
path: multispeaker-teo/test-*
---
提供机构:
jq
原始信息汇总
数据集概述
数据集配置及特征
-
multispeaker-ach
- 特征:
- id: int64
- text: string
- audio_language: string
- is_studio: bool
- speaker_id: string
- sample_rate: int64
- transcription: string
- edit_distance: int64
- 分割:
- train: 4811 examples, 704574 bytes
- dev: 101 examples, 14750 bytes
- test: 96 examples, 14497 bytes
- 下载大小: 401928
- 数据集大小: 733821
- 特征:
-
multispeaker-eng
- 特征: 同上
- 分割:
- train: 4797 examples, 734854 bytes
- dev: 100 examples, 15282 bytes
- test: 96 examples, 15194 bytes
- 下载大小: 402022
- 数据集大小: 765330
-
multispeaker-lgg
- 特征: 同上
- 分割:
- train: 4811 examples, 704330 bytes
- dev: 101 examples, 14684 bytes
- test: 96 examples, 14411 bytes
- 下载大小: 406173
- 数据集大小: 733425
-
multispeaker-lug
- 特征: 同上
- 分割:
- train: 5016 examples, 801734 bytes
- dev: 103 examples, 16421 bytes
- test: 97 examples, 16270 bytes
- 下载大小: 819770
- 数据集大小: 834425
-
multispeaker-nyn
- 特征: 同上
- 分割:
- train: 4811 examples, 700078 bytes
- dev: 101 examples, 14574 bytes
- test: 96 examples, 14351 bytes
- 下载大小: 417568
- 数据集大小: 729003
-
multispeaker-teo
- 特征: 同上
- 分割:
- train: 4811 examples, 690253 bytes
- dev: 101 examples, 14389 bytes
- test: 96 examples, 14113 bytes
- 下载大小: 401353
- 数据集大小: 718755
数据集文件路径
-
multispeaker-ach
- train: multispeaker-ach/train-*
- dev: multispeaker-ach/dev-*
- test: multispeaker-ach/test-*
-
multispeaker-eng
- train: multispeaker-eng/train-*
- dev: multispeaker-eng/dev-*
- test: multispeaker-eng/test-*
-
multispeaker-lgg
- train: multispeaker-lgg/train-*
- dev: multispeaker-lgg/dev-*
- test: multispeaker-lgg/test-*
-
multispeaker-lug
- train: multispeaker-lug/train-*
- dev: multispeaker-lug/dev-*
- test: multispeaker-lug/test-*
-
multispeaker-nyn
- train: multispeaker-nyn/train-*
- dev: multispeaker-nyn/dev-*
- test: multispeaker-nyn/test-*
-
multispeaker-teo
- train: multispeaker-teo/train-*
- dev: multispeaker-teo/dev-*
- test: multispeaker-teo/test-*



