kotoba-speech/seamless-align-enA-jaA
收藏Hugging Face2024-06-03 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/kotoba-speech/seamless-align-enA-jaA
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: default
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
- config_name: subset_1
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 877637273
num_examples: 2073
download_size: 142222083
dataset_size: 877637273
- config_name: subset_10
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 782114145
num_examples: 1961
download_size: 130083337
dataset_size: 782114145
- config_name: subset_100
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 761507587
num_examples: 1757
download_size: 126838339
dataset_size: 761507587
- config_name: subset_101
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 807833778
num_examples: 1873
download_size: 134525727
dataset_size: 807833778
- config_name: subset_102
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 831271512
num_examples: 1868
download_size: 138251731
dataset_size: 831271512
- config_name: subset_103
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 834056054
num_examples: 1879
download_size: 138766842
dataset_size: 834056054
- config_name: subset_104
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 829351429
num_examples: 1901
download_size: 138019563
dataset_size: 829351429
- config_name: subset_105
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 818071940
num_examples: 1875
download_size: 136022792
dataset_size: 818071940
- config_name: subset_106
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 823036134
num_examples: 1880
download_size: 137005371
dataset_size: 823036134
- config_name: subset_107
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 803967685
num_examples: 1854
download_size: 133866141
dataset_size: 803967685
- config_name: subset_108
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 818062920
num_examples: 1834
download_size: 136139232
dataset_size: 818062920
- config_name: subset_109
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 779390958
num_examples: 1770
download_size: 129654530
dataset_size: 779390958
- config_name: subset_11
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 706305559
num_examples: 1779
download_size: 117572069
dataset_size: 706305559
- config_name: subset_110
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 841220779
num_examples: 1908
download_size: 139946842
dataset_size: 841220779
- config_name: subset_111
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 823104769
num_examples: 1877
download_size: 137006268
dataset_size: 823104769
- config_name: subset_112
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 847413246
num_examples: 1924
download_size: 140982040
dataset_size: 847413246
- config_name: subset_114
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 855481808
num_examples: 1940
download_size: 142173167
dataset_size: 855481808
- config_name: subset_115
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 835516839
num_examples: 1902
download_size: 138972940
dataset_size: 835516839
- config_name: subset_116
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 850628710
num_examples: 1910
download_size: 141523643
dataset_size: 850628710
- config_name: subset_117
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 832126844
num_examples: 1901
download_size: 138561710
dataset_size: 832126844
- config_name: subset_118
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 851229693
num_examples: 1911
download_size: 141695675
dataset_size: 851229693
- config_name: subset_119
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 827876201
num_examples: 1867
download_size: 137744105
dataset_size: 827876201
- config_name: subset_12
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 786491483
num_examples: 1916
download_size: 130695694
dataset_size: 786491483
- config_name: subset_120
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 785043017
num_examples: 1774
download_size: 130680090
dataset_size: 785043017
- config_name: subset_121
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 846111964
num_examples: 1895
download_size: 140766293
dataset_size: 846111964
- config_name: subset_122
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 817456683
num_examples: 1851
download_size: 135945049
dataset_size: 817456683
- config_name: subset_123
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 861656359
num_examples: 1923
download_size: 143218681
dataset_size: 861656359
- config_name: subset_124
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 840645515
num_examples: 1886
download_size: 139759918
dataset_size: 840645515
- config_name: subset_125
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 850144143
num_examples: 1928
download_size: 141491381
dataset_size: 850144143
- config_name: subset_126
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 827636279
num_examples: 1903
download_size: 137818807
dataset_size: 827636279
- config_name: subset_127
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 848475483
num_examples: 1902
download_size: 141160699
dataset_size: 848475483
- config_name: subset_128
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 828471705
num_examples: 1890
download_size: 137863858
dataset_size: 828471705
- config_name: subset_129
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 780992459
num_examples: 1752
download_size: 130019788
dataset_size: 780992459
- config_name: subset_13
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 711297545
num_examples: 1769
download_size: 118482315
dataset_size: 711297545
- config_name: subset_130
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 814544555
num_examples: 1830
download_size: 135514850
dataset_size: 814544555
- config_name: subset_131
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 840494898
num_examples: 1882
download_size: 139805967
dataset_size: 840494898
- config_name: subset_132
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 844961984
num_examples: 1918
download_size: 140576912
dataset_size: 844961984
- config_name: subset_133
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 821935205
num_examples: 1886
download_size: 136809956
dataset_size: 821935205
- config_name: subset_134
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 845607901
num_examples: 1912
download_size: 140756262
dataset_size: 845607901
- config_name: subset_135
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 839542233
num_examples: 1888
download_size: 139631732
dataset_size: 839542233
- config_name: subset_136
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 830300897
num_examples: 1875
download_size: 138167859
dataset_size: 830300897
- config_name: subset_137
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 827576308
num_examples: 1866
download_size: 137657025
dataset_size: 827576308
- config_name: subset_138
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 823802409
num_examples: 1863
download_size: 137078547
dataset_size: 823802409
- config_name: subset_139
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 821972232
num_examples: 1859
download_size: 136745195
dataset_size: 821972232
- config_name: subset_14
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 700471113
num_examples: 1734
download_size: 116522744
dataset_size: 700471113
- config_name: subset_140
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 792881980
num_examples: 1766
download_size: 131980427
dataset_size: 792881980
- config_name: subset_141
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 840429233
num_examples: 1865
download_size: 139785320
dataset_size: 840429233
- config_name: subset_142
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 835013258
num_examples: 1893
download_size: 139055733
dataset_size: 835013258
- config_name: subset_143
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 836692531
num_examples: 1894
download_size: 139143351
dataset_size: 836692531
- config_name: subset_144
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 625333519
num_examples: 1381
download_size: 104049842
dataset_size: 625333519
- config_name: subset_15
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 774287918
num_examples: 1914
download_size: 128775014
dataset_size: 774287918
- config_name: subset_16
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 736130610
num_examples: 1862
download_size: 122638014
dataset_size: 736130610
- config_name: subset_17
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 766712455
num_examples: 1875
download_size: 127629604
dataset_size: 766712455
- config_name: subset_18
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 795318722
num_examples: 1937
download_size: 132345760
dataset_size: 795318722
- config_name: subset_19
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 768795566
num_examples: 1917
download_size: 127999106
dataset_size: 768795566
- config_name: subset_2
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 799386637
num_examples: 1929
download_size: 130142022
dataset_size: 799386637
- config_name: subset_20
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 760273102
num_examples: 1877
download_size: 126530239
dataset_size: 760273102
- config_name: subset_21
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 735526859
num_examples: 1761
download_size: 122525452
dataset_size: 735526859
- config_name: subset_22
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 747850509
num_examples: 1850
download_size: 124535530
dataset_size: 747850509
- config_name: subset_23
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 732181985
num_examples: 1790
download_size: 121987244
dataset_size: 732181985
- config_name: subset_24
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 720726471
num_examples: 1758
download_size: 120145822
dataset_size: 720726471
- config_name: subset_25
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 801219744
num_examples: 1898
download_size: 133302214
dataset_size: 801219744
- config_name: subset_26
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 785072097
num_examples: 1943
download_size: 130808423
dataset_size: 785072097
- config_name: subset_27
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 776394239
num_examples: 1903
download_size: 129411709
dataset_size: 776394239
- config_name: subset_28
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 787223287
num_examples: 1912
download_size: 131107717
dataset_size: 787223287
- config_name: subset_29
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 791000327
num_examples: 1945
download_size: 131795720
dataset_size: 791000327
- config_name: subset_3
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 770443500
num_examples: 1899
download_size: 126365234
dataset_size: 770443500
- config_name: subset_30
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 792907592
num_examples: 1902
download_size: 131940647
dataset_size: 792907592
- config_name: subset_31
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 756082354
num_examples: 1805
download_size: 125930470
dataset_size: 756082354
- config_name: subset_32
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 751623202
num_examples: 1797
download_size: 125249077
dataset_size: 751623202
- config_name: subset_33
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 733886174
num_examples: 1757
download_size: 122282391
dataset_size: 733886174
- config_name: subset_34
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 784865663
num_examples: 1893
download_size: 130707346
dataset_size: 784865663
- config_name: subset_35
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 795014330
num_examples: 1928
download_size: 132244828
dataset_size: 795014330
- config_name: subset_36
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 776136455
num_examples: 1863
download_size: 129302227
dataset_size: 776136455
- config_name: subset_37
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 779120285
num_examples: 1855
download_size: 129802323
dataset_size: 779120285
- config_name: subset_38
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 794572918
num_examples: 1890
download_size: 132271795
dataset_size: 794572918
- config_name: subset_39
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 785889178
num_examples: 1899
download_size: 130872474
dataset_size: 785889178
- config_name: subset_4
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 725260894
num_examples: 1835
download_size: 119696480
dataset_size: 725260894
- config_name: subset_40
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 808106060
num_examples: 1931
download_size: 134417865
dataset_size: 808106060
- config_name: subset_41
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 754434261
num_examples: 1784
download_size: 125622736
dataset_size: 754434261
- config_name: subset_42
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 754463596
num_examples: 1797
download_size: 125773100
dataset_size: 754463596
- config_name: subset_43
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 741011476
num_examples: 1757
download_size: 123434124
dataset_size: 741011476
- config_name: subset_44
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 779832416
num_examples: 1831
download_size: 129795239
dataset_size: 779832416
- config_name: subset_45
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 783861326
num_examples: 1891
download_size: 130673619
dataset_size: 783861326
- config_name: subset_46
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 804680320
num_examples: 1897
download_size: 133966955
dataset_size: 804680320
- config_name: subset_47
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 808595138
num_examples: 1897
download_size: 134546562
dataset_size: 808595138
- config_name: subset_48
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 808941372
num_examples: 1902
download_size: 134660577
dataset_size: 808941372
- config_name: subset_49
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 801528807
num_examples: 1875
download_size: 133347028
dataset_size: 801528807
- config_name: subset_5
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 793489531
num_examples: 1987
download_size: 131290865
dataset_size: 793489531
- config_name: subset_50
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 819241240
num_examples: 1951
download_size: 136526983
dataset_size: 819241240
- config_name: subset_51
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 758391153
num_examples: 1752
download_size: 126260257
dataset_size: 758391153
- config_name: subset_52
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 763565137
num_examples: 1780
download_size: 127199057
dataset_size: 763565137
- config_name: subset_53
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 790210937
num_examples: 1846
download_size: 131610703
dataset_size: 790210937
- config_name: subset_54
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 735557222
num_examples: 1723
download_size: 122474723
dataset_size: 735557222
- config_name: subset_55
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 790325569
num_examples: 1866
download_size: 131658652
dataset_size: 790325569
- config_name: subset_56
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 815141403
num_examples: 1893
download_size: 135700961
dataset_size: 815141403
- config_name: subset_57
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 816836107
num_examples: 1924
download_size: 135892840
dataset_size: 816836107
- config_name: subset_58
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 796378477
num_examples: 1881
download_size: 132493180
dataset_size: 796378477
- config_name: subset_59
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 798701754
num_examples: 1887
download_size: 133011646
dataset_size: 798701754
- config_name: subset_6
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 728188086
num_examples: 1810
download_size: 120701489
dataset_size: 728188086
- config_name: subset_60
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 812567821
num_examples: 1909
download_size: 135357262
dataset_size: 812567821
- config_name: subset_61
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 745150676
num_examples: 1728
download_size: 124125904
dataset_size: 745150676
- config_name: subset_62
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 757832834
num_examples: 1787
download_size: 126243207
dataset_size: 757832834
- config_name: subset_63
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 773329744
num_examples: 1790
download_size: 128919920
dataset_size: 773329744
- config_name: subset_64
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 779007998
num_examples: 1812
download_size: 129727597
dataset_size: 779007998
- config_name: subset_65
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 809157988
num_examples: 1877
download_size: 134778667
dataset_size: 809157988
- config_name: subset_66
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 811744279
num_examples: 1890
download_size: 135145984
dataset_size: 811744279
- config_name: subset_67
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 807905579
num_examples: 1873
download_size: 134544873
dataset_size: 807905579
- config_name: subset_69
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 829802622
num_examples: 1916
download_size: 138126186
dataset_size: 829802622
- config_name: subset_7
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 735297018
num_examples: 1832
download_size: 121904051
dataset_size: 735297018
- config_name: subset_70
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 817723045
num_examples: 1903
download_size: 136121908
dataset_size: 817723045
- config_name: subset_71
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 750129963
num_examples: 1736
download_size: 124822707
dataset_size: 750129963
- config_name: subset_72
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 828362877
num_examples: 1887
download_size: 137893047
dataset_size: 828362877
- config_name: subset_73
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 745293129
num_examples: 1736
download_size: 124195334
dataset_size: 745293129
- config_name: subset_74
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 793510042
num_examples: 1829
download_size: 132096959
dataset_size: 793510042
- config_name: subset_75
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 808745329
num_examples: 1862
download_size: 134585547
dataset_size: 808745329
- config_name: subset_76
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 821790014
num_examples: 1914
download_size: 136860206
dataset_size: 821790014
- config_name: subset_77
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 814381644
num_examples: 1874
download_size: 135672985
dataset_size: 814381644
- config_name: subset_78
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 810432081
num_examples: 1871
download_size: 134904361
dataset_size: 810432081
- config_name: subset_79
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 815651480
num_examples: 1891
download_size: 135792562
dataset_size: 815651480
- config_name: subset_8
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 802798092
num_examples: 2009
download_size: 133298825
dataset_size: 802798092
- config_name: subset_80
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 802428791
num_examples: 1885
download_size: 133698797
dataset_size: 802428791
- config_name: subset_81
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 834709652
num_examples: 1913
download_size: 138987343
dataset_size: 834709652
- config_name: subset_82
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 834266523
num_examples: 1910
download_size: 138888504
dataset_size: 834266523
- config_name: subset_83
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 819125422
num_examples: 1887
download_size: 136302868
dataset_size: 819125422
- config_name: subset_84
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 810720434
num_examples: 1867
download_size: 134940476
dataset_size: 810720434
- config_name: subset_85
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 809271789
num_examples: 1881
download_size: 134684237
dataset_size: 809271789
- config_name: subset_86
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 808527611
num_examples: 1862
download_size: 134602856
dataset_size: 808527611
- config_name: subset_87
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 819284692
num_examples: 1897
download_size: 136414865
dataset_size: 819284692
- config_name: subset_88
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 825322623
num_examples: 1900
download_size: 137455947
dataset_size: 825322623
- config_name: subset_9
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 785342947
num_examples: 1977
download_size: 130444875
dataset_size: 785342947
- config_name: subset_90
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 824052048
num_examples: 1913
download_size: 137210506
dataset_size: 824052048
- config_name: subset_91
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 829736231
num_examples: 1913
download_size: 138090842
dataset_size: 829736231
- config_name: subset_92
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 810068202
num_examples: 1886
download_size: 134916089
dataset_size: 810068202
- config_name: subset_93
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 823832441
num_examples: 1875
download_size: 137060599
dataset_size: 823832441
- config_name: subset_94
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 826814414
num_examples: 1900
download_size: 137662596
dataset_size: 826814414
- config_name: subset_95
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 809293443
num_examples: 1867
download_size: 134723501
dataset_size: 809293443
- config_name: subset_96
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 833099009
num_examples: 1900
download_size: 138602945
dataset_size: 833099009
- config_name: subset_97
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.tokens.bpe_tokens
sequence: int64
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 833807961
num_examples: 1899
download_size: 138728234
dataset_size: 833807961
- config_name: subset_98
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 833670389
num_examples: 1904
download_size: 138725943
dataset_size: 833670389
- config_name: subset_99
features:
- name: line_no
dtype: int64
- name: enA.id
dtype: string
- name: enA.laser_score
dtype: float64
- name: jaA.id
dtype: string
- name: jaA.laser_score
dtype: float64
- name: jaA.audio.tokens
sequence:
sequence: int64
- name: enA.audio.tokens
sequence:
sequence: int64
- name: jaA.audio.speaker_embedding
sequence: float32
- name: enA.audio.speaker_embedding
sequence: float32
- name: jaA.audio.tokens.bpe_tokens
sequence: int64
- name: enA.audio.tokens.bpe_tokens
sequence: int64
splits:
- name: train
num_bytes: 838513705
num_examples: 1901
download_size: 139490746
dataset_size: 838513705
configs:
- config_name: default
data_files:
- split: train
path: subset_*/train-*
- config_name: subset_1
data_files:
- split: train
path: subset_1/train-*
- config_name: subset_10
data_files:
- split: train
path: subset_10/train-*
- config_name: subset_100
data_files:
- split: train
path: subset_100/train-*
- config_name: subset_101
data_files:
- split: train
path: subset_101/train-*
- config_name: subset_102
data_files:
- split: train
path: subset_102/train-*
- config_name: subset_103
data_files:
- split: train
path: subset_103/train-*
- config_name: subset_104
data_files:
- split: train
path: subset_104/train-*
- config_name: subset_105
data_files:
- split: train
path: subset_105/train-*
- config_name: subset_106
data_files:
- split: train
path: subset_106/train-*
- config_name: subset_107
data_files:
- split: train
path: subset_107/train-*
- config_name: subset_108
data_files:
- split: train
path: subset_108/train-*
- config_name: subset_109
data_files:
- split: train
path: subset_109/train-*
- config_name: subset_11
data_files:
- split: train
path: subset_11/train-*
- config_name: subset_110
data_files:
- split: train
path: subset_110/train-*
- config_name: subset_111
data_files:
- split: train
path: subset_111/train-*
- config_name: subset_112
data_files:
- split: train
path: subset_112/train-*
- config_name: subset_114
data_files:
- split: train
path: subset_114/train-*
- config_name: subset_115
data_files:
- split: train
path: subset_115/train-*
- config_name: subset_116
data_files:
- split: train
path: subset_116/train-*
- config_name: subset_117
data_files:
- split: train
path: subset_117/train-*
- config_name: subset_118
data_files:
- split: train
path: subset_118/train-*
- config_name: subset_119
data_files:
- split: train
path: subset_119/train-*
- config_name: subset_12
data_files:
- split: train
path: subset_12/train-*
- config_name: subset_120
data_files:
- split: train
path: subset_120/train-*
- config_name: subset_121
data_files:
- split: train
path: subset_121/train-*
- config_name: subset_122
data_files:
- split: train
path: subset_122/train-*
- config_name: subset_123
data_files:
- split: train
path: subset_123/train-*
- config_name: subset_124
data_files:
- split: train
path: subset_124/train-*
- config_name: subset_125
data_files:
- split: train
path: subset_125/train-*
- config_name: subset_126
data_files:
- split: train
path: subset_126/train-*
- config_name: subset_127
data_files:
- split: train
path: subset_127/train-*
- config_name: subset_128
data_files:
- split: train
path: subset_128/train-*
- config_name: subset_129
data_files:
- split: train
path: subset_129/train-*
- config_name: subset_13
data_files:
- split: train
path: subset_13/train-*
- config_name: subset_130
data_files:
- split: train
path: subset_130/train-*
- config_name: subset_131
data_files:
- split: train
path: subset_131/train-*
- config_name: subset_132
data_files:
- split: train
path: subset_132/train-*
- config_name: subset_133
data_files:
- split: train
path: subset_133/train-*
- config_name: subset_134
data_files:
- split: train
path: subset_134/train-*
- config_name: subset_135
data_files:
- split: train
path: subset_135/train-*
- config_name: subset_136
data_files:
- split: train
path: subset_136/train-*
- config_name: subset_137
data_files:
- split: train
path: subset_137/train-*
- config_name: subset_138
data_files:
- split: train
path: subset_138/train-*
- config_name: subset_139
data_files:
- split: train
path: subset_139/train-*
- config_name: subset_14
data_files:
- split: train
path: subset_14/train-*
- config_name: subset_140
data_files:
- split: train
path: subset_140/train-*
- config_name: subset_141
data_files:
- split: train
path: subset_141/train-*
- config_name: subset_142
data_files:
- split: train
path: subset_142/train-*
- config_name: subset_143
data_files:
- split: train
path: subset_143/train-*
- config_name: subset_144
data_files:
- split: train
path: subset_144/train-*
- config_name: subset_15
data_files:
- split: train
path: subset_15/train-*
- config_name: subset_16
data_files:
- split: train
path: subset_16/train-*
- config_name: subset_17
data_files:
- split: train
path: subset_17/train-*
- config_name: subset_18
data_files:
- split: train
path: subset_18/train-*
- config_name: subset_19
data_files:
- split: train
path: subset_19/train-*
- config_name: subset_2
data_files:
- split: train
path: subset_2/train-*
- config_name: subset_20
data_files:
- split: train
path: subset_20/train-*
- config_name: subset_21
data_files:
- split: train
path: subset_21/train-*
- config_name: subset_22
data_files:
- split: train
path: subset_22/train-*
- config_name: subset_23
data_files:
- split: train
path: subset_23/train-*
- config_name: subset_24
data_files:
- split: train
path: subset_24/train-*
- config_name: subset_25
data_files:
- split: train
path: subset_25/train-*
- config_name: subset_26
data_files:
- split: train
path: subset_26/train-*
- config_name: subset_27
data_files:
- split: train
path: subset_27/train-*
- config_name: subset_28
data_files:
- split: train
path: subset_28/train-*
- config_name: subset_29
data_files:
- split: train
path: subset_29/train-*
- config_name: subset_3
data_files:
- split: train
path: subset_3/train-*
- config_name: subset_30
data_files:
- split: train
path: subset_30/train-*
- config_name: subset_31
data_files:
- split: train
path: subset_31/train-*
- config_name: subset_32
data_files:
- split: train
path: subset_32/train-*
- config_name: subset_33
data_files:
- split: train
path: subset_33/train-*
- config_name: subset_34
data_files:
- split: train
path: subset_34/train-*
- config_name: subset_35
data_files:
- split: train
path: subset_35/train-*
- config_name: subset_36
data_files:
- split: train
path: subset_36/train-*
- config_name: subset_37
data_files:
- split: train
path: subset_37/train-*
- config_name: subset_38
data_files:
- split: train
path: subset_38/train-*
- config_name: subset_39
data_files:
- split: train
path: subset_39/train-*
- config_name: subset_4
data_files:
- split: train
path: subset_4/train-*
- config_name: subset_40
data_files:
- split: train
path: subset_40/train-*
- config_name: subset_41
data_files:
- split: train
path: subset_41/train-*
- config_name: subset_42
data_files:
- split: train
path: subset_42/train-*
- config_name: subset_43
data_files:
- split: train
path: subset_43/train-*
- config_name: subset_44
data_files:
- split: train
path: subset_44/train-*
- config_name: subset_45
data_files:
- split: train
path: subset_45/train-*
- config_name: subset_46
data_files:
- split: train
path: subset_46/train-*
- config_name: subset_47
data_files:
- split: train
path: subset_47/train-*
- config_name: subset_48
data_files:
- split: train
path: subset_48/train-*
- config_name: subset_49
data_files:
- split: train
path: subset_49/train-*
- config_name: subset_5
data_files:
- split: train
path: subset_5/train-*
- config_name: subset_50
data_files:
- split: train
path: subset_50/train-*
- config_name: subset_51
data_files:
- split: train
path: subset_51/train-*
- config_name: subset_52
data_files:
- split: train
path: subset_52/train-*
- config_name: subset_53
data_files:
- split: train
path: subset_53/train-*
- config_name: subset_54
data_files:
- split: train
path: subset_54/train-*
- config_name: subset_55
data_files:
- split: train
path: subset_55/train-*
- config_name: subset_56
data_files:
- split: train
path: subset_56/train-*
- config_name: subset_57
data_files:
- split: train
path: subset_57/train-*
- config_name: subset_58
data_files:
- split: train
path: subset_58/train-*
- config_name: subset_59
data_files:
- split: train
path: subset_59/train-*
- config_name: subset_6
data_files:
- split: train
path: subset_6/train-*
- config_name: subset_60
data_files:
- split: train
path: subset_60/train-*
- config_name: subset_61
data_files:
- split: train
path: subset_61/train-*
- config_name: subset_62
data_files:
- split: train
path: subset_62/train-*
- config_name: subset_63
data_files:
- split: train
path: subset_63/train-*
- config_name: subset_64
data_files:
- split: train
path: subset_64/train-*
- config_name: subset_65
data_files:
- split: train
path: subset_65/train-*
- config_name: subset_66
data_files:
- split: train
path: subset_66/train-*
- config_name: subset_67
data_files:
- split: train
path: subset_67/train-*
- config_name: subset_69
data_files:
- split: train
path: subset_69/train-*
- config_name: subset_7
data_files:
- split: train
path: subset_7/train-*
- config_name: subset_70
data_files:
- split: train
path: subset_70/train-*
- config_name: subset_71
data_files:
- split: train
path: subset_71/train-*
- config_name: subset_72
data_files:
- split: train
path: subset_72/train-*
- config_name: subset_73
data_files:
- split: train
path: subset_73/train-*
- config_name: subset_74
data_files:
- split: train
path: subset_74/train-*
- config_name: subset_75
data_files:
- split: train
path: subset_75/train-*
- config_name: subset_76
data_files:
- split: train
path: subset_76/train-*
- config_name: subset_77
data_files:
- split: train
path: subset_77/train-*
- config_name: subset_78
data_files:
- split: train
path: subset_78/train-*
- config_name: subset_79
data_files:
- split: train
path: subset_79/train-*
- config_name: subset_8
data_files:
- split: train
path: subset_8/train-*
- config_name: subset_80
data_files:
- split: train
path: subset_80/train-*
- config_name: subset_81
data_files:
- split: train
path: subset_81/train-*
- config_name: subset_82
data_files:
- split: train
path: subset_82/train-*
- config_name: subset_83
data_files:
- split: train
path: subset_83/train-*
- config_name: subset_84
data_files:
- split: train
path: subset_84/train-*
- config_name: subset_85
data_files:
- split: train
path: subset_85/train-*
- config_name: subset_86
data_files:
- split: train
path: subset_86/train-*
- config_name: subset_87
data_files:
- split: train
path: subset_87/train-*
- config_name: subset_88
data_files:
- split: train
path: subset_88/train-*
- config_name: subset_9
data_files:
- split: train
path: subset_9/train-*
- config_name: subset_90
data_files:
- split: train
path: subset_90/train-*
- config_name: subset_91
data_files:
- split: train
path: subset_91/train-*
- config_name: subset_92
data_files:
- split: train
path: subset_92/train-*
- config_name: subset_93
data_files:
- split: train
path: subset_93/train-*
- config_name: subset_94
data_files:
- split: train
path: subset_94/train-*
- config_name: subset_95
data_files:
- split: train
path: subset_95/train-*
- config_name: subset_96
data_files:
- split: train
path: subset_96/train-*
- config_name: subset_97
data_files:
- split: train
path: subset_97/train-*
- config_name: subset_98
data_files:
- split: train
path: subset_98/train-*
- config_name: subset_99
data_files:
- split: train
path: subset_99/train-*
---
# Dataset Card for Dataset Name
<!-- Provide a quick summary of the dataset. -->
This dataset card aims to be a base template for new datasets. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1).
## Dataset Details
### Dataset Description
<!-- Provide a longer summary of what this dataset is. -->
- **Curated by:** [More Information Needed]
- **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed]
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
### Dataset Sources [optional]
<!-- Provide the basic links for the dataset. -->
- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]
## Uses
<!-- Address questions around how the dataset is intended to be used. -->
### Direct Use
<!-- This section describes suitable use cases for the dataset. -->
[More Information Needed]
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the dataset will not work well for. -->
[More Information Needed]
## Dataset Structure
<!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. -->
[More Information Needed]
## Dataset Creation
### Curation Rationale
<!-- Motivation for the creation of this dataset. -->
[More Information Needed]
### Source Data
<!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). -->
#### Data Collection and Processing
<!-- This section describes the data collection and processing process such as data selection criteria, filtering and normalization methods, tools and libraries used, etc. -->
[More Information Needed]
#### Who are the source data producers?
<!-- This section describes the people or systems who originally created the data. It should also include self-reported demographic or identity information for the source data creators if this information is available. -->
[More Information Needed]
### Annotations [optional]
<!-- If the dataset contains annotations which are not part of the initial data collection, use this section to describe them. -->
#### Annotation process
<!-- This section describes the annotation process such as annotation tools used in the process, the amount of data annotated, annotation guidelines provided to the annotators, interannotator statistics, annotation validation, etc. -->
[More Information Needed]
#### Who are the annotators?
<!-- This section describes the people or systems who created the annotations. -->
[More Information Needed]
#### Personal and Sensitive Information
<!-- State whether the dataset contains data that might be considered personal, sensitive, or private (e.g., data that reveals addresses, uniquely identifiable names or aliases, racial or ethnic origins, sexual orientations, religious beliefs, political opinions, financial or health data, etc.). If efforts were made to anonymize the data, describe the anonymization process. -->
[More Information Needed]
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
[More Information Needed]
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations.
## Citation [optional]
<!-- If there is a paper or blog post introducing the dataset, the APA and Bibtex information for that should go in this section. -->
**BibTeX:**
[More Information Needed]
**APA:**
[More Information Needed]
## Glossary [optional]
<!-- If relevant, include terms and calculations in this section that can help readers understand the dataset or dataset card. -->
[More Information Needed]
## More Information [optional]
[More Information Needed]
## Dataset Card Authors [optional]
[More Information Needed]
## Dataset Card Contact
[More Information Needed]
The dataset consists of multiple subsets, each containing audio data in English and Japanese, with features including line number, language ID, LASER score, audio tokens, and speaker embeddings. The dataset is primarily used for training, with each subset having different training data sizes and number of examples.
提供机构:
kotoba-speech
原始信息汇总
数据集概述
数据集配置
- config_name: default, subset_1, subset_10, subset_100, subset_101, subset_102, subset_103, subset_104, subset_105, subset_106, subset_107, subset_108, subset_109, subset_11, subset_110, subset_111, subset_112, subset_114, subset_115, subset_116, subset_117, subset_118, subset_119, subset_12, subset_120, subset_121, subset_122, subset_123
数据集特征
- line_no: 数据类型为int64。
- enA.id: 数据类型为string。
- enA.laser_score: 数据类型为float64。
- jaA.id: 数据类型为string。
- jaA.laser_score: 数据类型为float64。
- jaA.audio.tokens: 序列数据类型为int64。
- enA.audio.tokens: 序列数据类型为int64。
- enA.audio.speaker_embedding: 序列数据类型为float32。
- jaA.audio.speaker_embedding: 序列数据类型为float32。
- jaA.audio.tokens.bpe_tokens: 序列数据类型为int64。
- enA.audio.tokens.bpe_tokens: 序列数据类型为int64。
数据集分割
- train: 每个配置中都包含训练集,具体信息如下:
- num_bytes: 存储大小,例如subset_1为877637273字节。
- num_examples: 样本数量,例如subset_1为2073个样本。
- download_size: 下载大小,例如subset_1为142222083字节。
- dataset_size: 数据集大小,例如subset_1为877637273字节。
以上信息概述了数据集的基本结构和内容,包括不同配置下的特征和数据集分割详情。



