five

kotoba-speech/seamless-align-enA-jaA

收藏
Hugging Face2024-06-03 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/kotoba-speech/seamless-align-enA-jaA
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: default features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train - config_name: subset_1 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 877637273 num_examples: 2073 download_size: 142222083 dataset_size: 877637273 - config_name: subset_10 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 782114145 num_examples: 1961 download_size: 130083337 dataset_size: 782114145 - config_name: subset_100 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 761507587 num_examples: 1757 download_size: 126838339 dataset_size: 761507587 - config_name: subset_101 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 807833778 num_examples: 1873 download_size: 134525727 dataset_size: 807833778 - config_name: subset_102 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 831271512 num_examples: 1868 download_size: 138251731 dataset_size: 831271512 - config_name: subset_103 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 834056054 num_examples: 1879 download_size: 138766842 dataset_size: 834056054 - config_name: subset_104 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 829351429 num_examples: 1901 download_size: 138019563 dataset_size: 829351429 - config_name: subset_105 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 818071940 num_examples: 1875 download_size: 136022792 dataset_size: 818071940 - config_name: subset_106 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 823036134 num_examples: 1880 download_size: 137005371 dataset_size: 823036134 - config_name: subset_107 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 803967685 num_examples: 1854 download_size: 133866141 dataset_size: 803967685 - config_name: subset_108 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 818062920 num_examples: 1834 download_size: 136139232 dataset_size: 818062920 - config_name: subset_109 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 779390958 num_examples: 1770 download_size: 129654530 dataset_size: 779390958 - config_name: subset_11 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 706305559 num_examples: 1779 download_size: 117572069 dataset_size: 706305559 - config_name: subset_110 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 841220779 num_examples: 1908 download_size: 139946842 dataset_size: 841220779 - config_name: subset_111 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 823104769 num_examples: 1877 download_size: 137006268 dataset_size: 823104769 - config_name: subset_112 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 847413246 num_examples: 1924 download_size: 140982040 dataset_size: 847413246 - config_name: subset_114 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 855481808 num_examples: 1940 download_size: 142173167 dataset_size: 855481808 - config_name: subset_115 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 835516839 num_examples: 1902 download_size: 138972940 dataset_size: 835516839 - config_name: subset_116 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 850628710 num_examples: 1910 download_size: 141523643 dataset_size: 850628710 - config_name: subset_117 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 832126844 num_examples: 1901 download_size: 138561710 dataset_size: 832126844 - config_name: subset_118 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 851229693 num_examples: 1911 download_size: 141695675 dataset_size: 851229693 - config_name: subset_119 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 827876201 num_examples: 1867 download_size: 137744105 dataset_size: 827876201 - config_name: subset_12 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 786491483 num_examples: 1916 download_size: 130695694 dataset_size: 786491483 - config_name: subset_120 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 785043017 num_examples: 1774 download_size: 130680090 dataset_size: 785043017 - config_name: subset_121 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 846111964 num_examples: 1895 download_size: 140766293 dataset_size: 846111964 - config_name: subset_122 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 817456683 num_examples: 1851 download_size: 135945049 dataset_size: 817456683 - config_name: subset_123 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 861656359 num_examples: 1923 download_size: 143218681 dataset_size: 861656359 - config_name: subset_124 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 840645515 num_examples: 1886 download_size: 139759918 dataset_size: 840645515 - config_name: subset_125 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 850144143 num_examples: 1928 download_size: 141491381 dataset_size: 850144143 - config_name: subset_126 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 827636279 num_examples: 1903 download_size: 137818807 dataset_size: 827636279 - config_name: subset_127 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 848475483 num_examples: 1902 download_size: 141160699 dataset_size: 848475483 - config_name: subset_128 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 828471705 num_examples: 1890 download_size: 137863858 dataset_size: 828471705 - config_name: subset_129 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 780992459 num_examples: 1752 download_size: 130019788 dataset_size: 780992459 - config_name: subset_13 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 711297545 num_examples: 1769 download_size: 118482315 dataset_size: 711297545 - config_name: subset_130 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 814544555 num_examples: 1830 download_size: 135514850 dataset_size: 814544555 - config_name: subset_131 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 840494898 num_examples: 1882 download_size: 139805967 dataset_size: 840494898 - config_name: subset_132 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 844961984 num_examples: 1918 download_size: 140576912 dataset_size: 844961984 - config_name: subset_133 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 821935205 num_examples: 1886 download_size: 136809956 dataset_size: 821935205 - config_name: subset_134 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 845607901 num_examples: 1912 download_size: 140756262 dataset_size: 845607901 - config_name: subset_135 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 839542233 num_examples: 1888 download_size: 139631732 dataset_size: 839542233 - config_name: subset_136 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 830300897 num_examples: 1875 download_size: 138167859 dataset_size: 830300897 - config_name: subset_137 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 827576308 num_examples: 1866 download_size: 137657025 dataset_size: 827576308 - config_name: subset_138 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 823802409 num_examples: 1863 download_size: 137078547 dataset_size: 823802409 - config_name: subset_139 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 821972232 num_examples: 1859 download_size: 136745195 dataset_size: 821972232 - config_name: subset_14 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 700471113 num_examples: 1734 download_size: 116522744 dataset_size: 700471113 - config_name: subset_140 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 792881980 num_examples: 1766 download_size: 131980427 dataset_size: 792881980 - config_name: subset_141 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 840429233 num_examples: 1865 download_size: 139785320 dataset_size: 840429233 - config_name: subset_142 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 835013258 num_examples: 1893 download_size: 139055733 dataset_size: 835013258 - config_name: subset_143 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 836692531 num_examples: 1894 download_size: 139143351 dataset_size: 836692531 - config_name: subset_144 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 625333519 num_examples: 1381 download_size: 104049842 dataset_size: 625333519 - config_name: subset_15 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 774287918 num_examples: 1914 download_size: 128775014 dataset_size: 774287918 - config_name: subset_16 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 736130610 num_examples: 1862 download_size: 122638014 dataset_size: 736130610 - config_name: subset_17 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 766712455 num_examples: 1875 download_size: 127629604 dataset_size: 766712455 - config_name: subset_18 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 795318722 num_examples: 1937 download_size: 132345760 dataset_size: 795318722 - config_name: subset_19 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 768795566 num_examples: 1917 download_size: 127999106 dataset_size: 768795566 - config_name: subset_2 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 799386637 num_examples: 1929 download_size: 130142022 dataset_size: 799386637 - config_name: subset_20 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 760273102 num_examples: 1877 download_size: 126530239 dataset_size: 760273102 - config_name: subset_21 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 735526859 num_examples: 1761 download_size: 122525452 dataset_size: 735526859 - config_name: subset_22 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 747850509 num_examples: 1850 download_size: 124535530 dataset_size: 747850509 - config_name: subset_23 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 732181985 num_examples: 1790 download_size: 121987244 dataset_size: 732181985 - config_name: subset_24 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 720726471 num_examples: 1758 download_size: 120145822 dataset_size: 720726471 - config_name: subset_25 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 801219744 num_examples: 1898 download_size: 133302214 dataset_size: 801219744 - config_name: subset_26 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 785072097 num_examples: 1943 download_size: 130808423 dataset_size: 785072097 - config_name: subset_27 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 776394239 num_examples: 1903 download_size: 129411709 dataset_size: 776394239 - config_name: subset_28 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 787223287 num_examples: 1912 download_size: 131107717 dataset_size: 787223287 - config_name: subset_29 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 791000327 num_examples: 1945 download_size: 131795720 dataset_size: 791000327 - config_name: subset_3 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 770443500 num_examples: 1899 download_size: 126365234 dataset_size: 770443500 - config_name: subset_30 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 792907592 num_examples: 1902 download_size: 131940647 dataset_size: 792907592 - config_name: subset_31 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 756082354 num_examples: 1805 download_size: 125930470 dataset_size: 756082354 - config_name: subset_32 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 751623202 num_examples: 1797 download_size: 125249077 dataset_size: 751623202 - config_name: subset_33 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 733886174 num_examples: 1757 download_size: 122282391 dataset_size: 733886174 - config_name: subset_34 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 784865663 num_examples: 1893 download_size: 130707346 dataset_size: 784865663 - config_name: subset_35 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 795014330 num_examples: 1928 download_size: 132244828 dataset_size: 795014330 - config_name: subset_36 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 776136455 num_examples: 1863 download_size: 129302227 dataset_size: 776136455 - config_name: subset_37 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 779120285 num_examples: 1855 download_size: 129802323 dataset_size: 779120285 - config_name: subset_38 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 794572918 num_examples: 1890 download_size: 132271795 dataset_size: 794572918 - config_name: subset_39 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 785889178 num_examples: 1899 download_size: 130872474 dataset_size: 785889178 - config_name: subset_4 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 725260894 num_examples: 1835 download_size: 119696480 dataset_size: 725260894 - config_name: subset_40 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 808106060 num_examples: 1931 download_size: 134417865 dataset_size: 808106060 - config_name: subset_41 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 754434261 num_examples: 1784 download_size: 125622736 dataset_size: 754434261 - config_name: subset_42 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 754463596 num_examples: 1797 download_size: 125773100 dataset_size: 754463596 - config_name: subset_43 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 741011476 num_examples: 1757 download_size: 123434124 dataset_size: 741011476 - config_name: subset_44 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 779832416 num_examples: 1831 download_size: 129795239 dataset_size: 779832416 - config_name: subset_45 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 783861326 num_examples: 1891 download_size: 130673619 dataset_size: 783861326 - config_name: subset_46 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 804680320 num_examples: 1897 download_size: 133966955 dataset_size: 804680320 - config_name: subset_47 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 808595138 num_examples: 1897 download_size: 134546562 dataset_size: 808595138 - config_name: subset_48 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 808941372 num_examples: 1902 download_size: 134660577 dataset_size: 808941372 - config_name: subset_49 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 801528807 num_examples: 1875 download_size: 133347028 dataset_size: 801528807 - config_name: subset_5 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 793489531 num_examples: 1987 download_size: 131290865 dataset_size: 793489531 - config_name: subset_50 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 819241240 num_examples: 1951 download_size: 136526983 dataset_size: 819241240 - config_name: subset_51 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 758391153 num_examples: 1752 download_size: 126260257 dataset_size: 758391153 - config_name: subset_52 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 763565137 num_examples: 1780 download_size: 127199057 dataset_size: 763565137 - config_name: subset_53 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 790210937 num_examples: 1846 download_size: 131610703 dataset_size: 790210937 - config_name: subset_54 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 735557222 num_examples: 1723 download_size: 122474723 dataset_size: 735557222 - config_name: subset_55 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 790325569 num_examples: 1866 download_size: 131658652 dataset_size: 790325569 - config_name: subset_56 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 815141403 num_examples: 1893 download_size: 135700961 dataset_size: 815141403 - config_name: subset_57 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 816836107 num_examples: 1924 download_size: 135892840 dataset_size: 816836107 - config_name: subset_58 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 796378477 num_examples: 1881 download_size: 132493180 dataset_size: 796378477 - config_name: subset_59 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 798701754 num_examples: 1887 download_size: 133011646 dataset_size: 798701754 - config_name: subset_6 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 728188086 num_examples: 1810 download_size: 120701489 dataset_size: 728188086 - config_name: subset_60 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 812567821 num_examples: 1909 download_size: 135357262 dataset_size: 812567821 - config_name: subset_61 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 745150676 num_examples: 1728 download_size: 124125904 dataset_size: 745150676 - config_name: subset_62 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 757832834 num_examples: 1787 download_size: 126243207 dataset_size: 757832834 - config_name: subset_63 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 773329744 num_examples: 1790 download_size: 128919920 dataset_size: 773329744 - config_name: subset_64 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 779007998 num_examples: 1812 download_size: 129727597 dataset_size: 779007998 - config_name: subset_65 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 809157988 num_examples: 1877 download_size: 134778667 dataset_size: 809157988 - config_name: subset_66 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 811744279 num_examples: 1890 download_size: 135145984 dataset_size: 811744279 - config_name: subset_67 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 807905579 num_examples: 1873 download_size: 134544873 dataset_size: 807905579 - config_name: subset_69 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 829802622 num_examples: 1916 download_size: 138126186 dataset_size: 829802622 - config_name: subset_7 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 735297018 num_examples: 1832 download_size: 121904051 dataset_size: 735297018 - config_name: subset_70 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 817723045 num_examples: 1903 download_size: 136121908 dataset_size: 817723045 - config_name: subset_71 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 750129963 num_examples: 1736 download_size: 124822707 dataset_size: 750129963 - config_name: subset_72 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 828362877 num_examples: 1887 download_size: 137893047 dataset_size: 828362877 - config_name: subset_73 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 745293129 num_examples: 1736 download_size: 124195334 dataset_size: 745293129 - config_name: subset_74 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 793510042 num_examples: 1829 download_size: 132096959 dataset_size: 793510042 - config_name: subset_75 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 808745329 num_examples: 1862 download_size: 134585547 dataset_size: 808745329 - config_name: subset_76 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 821790014 num_examples: 1914 download_size: 136860206 dataset_size: 821790014 - config_name: subset_77 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 814381644 num_examples: 1874 download_size: 135672985 dataset_size: 814381644 - config_name: subset_78 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 810432081 num_examples: 1871 download_size: 134904361 dataset_size: 810432081 - config_name: subset_79 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 815651480 num_examples: 1891 download_size: 135792562 dataset_size: 815651480 - config_name: subset_8 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 802798092 num_examples: 2009 download_size: 133298825 dataset_size: 802798092 - config_name: subset_80 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 802428791 num_examples: 1885 download_size: 133698797 dataset_size: 802428791 - config_name: subset_81 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 834709652 num_examples: 1913 download_size: 138987343 dataset_size: 834709652 - config_name: subset_82 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 834266523 num_examples: 1910 download_size: 138888504 dataset_size: 834266523 - config_name: subset_83 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 819125422 num_examples: 1887 download_size: 136302868 dataset_size: 819125422 - config_name: subset_84 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 810720434 num_examples: 1867 download_size: 134940476 dataset_size: 810720434 - config_name: subset_85 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 809271789 num_examples: 1881 download_size: 134684237 dataset_size: 809271789 - config_name: subset_86 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 808527611 num_examples: 1862 download_size: 134602856 dataset_size: 808527611 - config_name: subset_87 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 819284692 num_examples: 1897 download_size: 136414865 dataset_size: 819284692 - config_name: subset_88 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 825322623 num_examples: 1900 download_size: 137455947 dataset_size: 825322623 - config_name: subset_9 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 785342947 num_examples: 1977 download_size: 130444875 dataset_size: 785342947 - config_name: subset_90 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 824052048 num_examples: 1913 download_size: 137210506 dataset_size: 824052048 - config_name: subset_91 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 829736231 num_examples: 1913 download_size: 138090842 dataset_size: 829736231 - config_name: subset_92 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 810068202 num_examples: 1886 download_size: 134916089 dataset_size: 810068202 - config_name: subset_93 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 823832441 num_examples: 1875 download_size: 137060599 dataset_size: 823832441 - config_name: subset_94 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 826814414 num_examples: 1900 download_size: 137662596 dataset_size: 826814414 - config_name: subset_95 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 809293443 num_examples: 1867 download_size: 134723501 dataset_size: 809293443 - config_name: subset_96 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 833099009 num_examples: 1900 download_size: 138602945 dataset_size: 833099009 - config_name: subset_97 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.tokens.bpe_tokens sequence: int64 - name: jaA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 833807961 num_examples: 1899 download_size: 138728234 dataset_size: 833807961 - config_name: subset_98 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 833670389 num_examples: 1904 download_size: 138725943 dataset_size: 833670389 - config_name: subset_99 features: - name: line_no dtype: int64 - name: enA.id dtype: string - name: enA.laser_score dtype: float64 - name: jaA.id dtype: string - name: jaA.laser_score dtype: float64 - name: jaA.audio.tokens sequence: sequence: int64 - name: enA.audio.tokens sequence: sequence: int64 - name: jaA.audio.speaker_embedding sequence: float32 - name: enA.audio.speaker_embedding sequence: float32 - name: jaA.audio.tokens.bpe_tokens sequence: int64 - name: enA.audio.tokens.bpe_tokens sequence: int64 splits: - name: train num_bytes: 838513705 num_examples: 1901 download_size: 139490746 dataset_size: 838513705 configs: - config_name: default data_files: - split: train path: subset_*/train-* - config_name: subset_1 data_files: - split: train path: subset_1/train-* - config_name: subset_10 data_files: - split: train path: subset_10/train-* - config_name: subset_100 data_files: - split: train path: subset_100/train-* - config_name: subset_101 data_files: - split: train path: subset_101/train-* - config_name: subset_102 data_files: - split: train path: subset_102/train-* - config_name: subset_103 data_files: - split: train path: subset_103/train-* - config_name: subset_104 data_files: - split: train path: subset_104/train-* - config_name: subset_105 data_files: - split: train path: subset_105/train-* - config_name: subset_106 data_files: - split: train path: subset_106/train-* - config_name: subset_107 data_files: - split: train path: subset_107/train-* - config_name: subset_108 data_files: - split: train path: subset_108/train-* - config_name: subset_109 data_files: - split: train path: subset_109/train-* - config_name: subset_11 data_files: - split: train path: subset_11/train-* - config_name: subset_110 data_files: - split: train path: subset_110/train-* - config_name: subset_111 data_files: - split: train path: subset_111/train-* - config_name: subset_112 data_files: - split: train path: subset_112/train-* - config_name: subset_114 data_files: - split: train path: subset_114/train-* - config_name: subset_115 data_files: - split: train path: subset_115/train-* - config_name: subset_116 data_files: - split: train path: subset_116/train-* - config_name: subset_117 data_files: - split: train path: subset_117/train-* - config_name: subset_118 data_files: - split: train path: subset_118/train-* - config_name: subset_119 data_files: - split: train path: subset_119/train-* - config_name: subset_12 data_files: - split: train path: subset_12/train-* - config_name: subset_120 data_files: - split: train path: subset_120/train-* - config_name: subset_121 data_files: - split: train path: subset_121/train-* - config_name: subset_122 data_files: - split: train path: subset_122/train-* - config_name: subset_123 data_files: - split: train path: subset_123/train-* - config_name: subset_124 data_files: - split: train path: subset_124/train-* - config_name: subset_125 data_files: - split: train path: subset_125/train-* - config_name: subset_126 data_files: - split: train path: subset_126/train-* - config_name: subset_127 data_files: - split: train path: subset_127/train-* - config_name: subset_128 data_files: - split: train path: subset_128/train-* - config_name: subset_129 data_files: - split: train path: subset_129/train-* - config_name: subset_13 data_files: - split: train path: subset_13/train-* - config_name: subset_130 data_files: - split: train path: subset_130/train-* - config_name: subset_131 data_files: - split: train path: subset_131/train-* - config_name: subset_132 data_files: - split: train path: subset_132/train-* - config_name: subset_133 data_files: - split: train path: subset_133/train-* - config_name: subset_134 data_files: - split: train path: subset_134/train-* - config_name: subset_135 data_files: - split: train path: subset_135/train-* - config_name: subset_136 data_files: - split: train path: subset_136/train-* - config_name: subset_137 data_files: - split: train path: subset_137/train-* - config_name: subset_138 data_files: - split: train path: subset_138/train-* - config_name: subset_139 data_files: - split: train path: subset_139/train-* - config_name: subset_14 data_files: - split: train path: subset_14/train-* - config_name: subset_140 data_files: - split: train path: subset_140/train-* - config_name: subset_141 data_files: - split: train path: subset_141/train-* - config_name: subset_142 data_files: - split: train path: subset_142/train-* - config_name: subset_143 data_files: - split: train path: subset_143/train-* - config_name: subset_144 data_files: - split: train path: subset_144/train-* - config_name: subset_15 data_files: - split: train path: subset_15/train-* - config_name: subset_16 data_files: - split: train path: subset_16/train-* - config_name: subset_17 data_files: - split: train path: subset_17/train-* - config_name: subset_18 data_files: - split: train path: subset_18/train-* - config_name: subset_19 data_files: - split: train path: subset_19/train-* - config_name: subset_2 data_files: - split: train path: subset_2/train-* - config_name: subset_20 data_files: - split: train path: subset_20/train-* - config_name: subset_21 data_files: - split: train path: subset_21/train-* - config_name: subset_22 data_files: - split: train path: subset_22/train-* - config_name: subset_23 data_files: - split: train path: subset_23/train-* - config_name: subset_24 data_files: - split: train path: subset_24/train-* - config_name: subset_25 data_files: - split: train path: subset_25/train-* - config_name: subset_26 data_files: - split: train path: subset_26/train-* - config_name: subset_27 data_files: - split: train path: subset_27/train-* - config_name: subset_28 data_files: - split: train path: subset_28/train-* - config_name: subset_29 data_files: - split: train path: subset_29/train-* - config_name: subset_3 data_files: - split: train path: subset_3/train-* - config_name: subset_30 data_files: - split: train path: subset_30/train-* - config_name: subset_31 data_files: - split: train path: subset_31/train-* - config_name: subset_32 data_files: - split: train path: subset_32/train-* - config_name: subset_33 data_files: - split: train path: subset_33/train-* - config_name: subset_34 data_files: - split: train path: subset_34/train-* - config_name: subset_35 data_files: - split: train path: subset_35/train-* - config_name: subset_36 data_files: - split: train path: subset_36/train-* - config_name: subset_37 data_files: - split: train path: subset_37/train-* - config_name: subset_38 data_files: - split: train path: subset_38/train-* - config_name: subset_39 data_files: - split: train path: subset_39/train-* - config_name: subset_4 data_files: - split: train path: subset_4/train-* - config_name: subset_40 data_files: - split: train path: subset_40/train-* - config_name: subset_41 data_files: - split: train path: subset_41/train-* - config_name: subset_42 data_files: - split: train path: subset_42/train-* - config_name: subset_43 data_files: - split: train path: subset_43/train-* - config_name: subset_44 data_files: - split: train path: subset_44/train-* - config_name: subset_45 data_files: - split: train path: subset_45/train-* - config_name: subset_46 data_files: - split: train path: subset_46/train-* - config_name: subset_47 data_files: - split: train path: subset_47/train-* - config_name: subset_48 data_files: - split: train path: subset_48/train-* - config_name: subset_49 data_files: - split: train path: subset_49/train-* - config_name: subset_5 data_files: - split: train path: subset_5/train-* - config_name: subset_50 data_files: - split: train path: subset_50/train-* - config_name: subset_51 data_files: - split: train path: subset_51/train-* - config_name: subset_52 data_files: - split: train path: subset_52/train-* - config_name: subset_53 data_files: - split: train path: subset_53/train-* - config_name: subset_54 data_files: - split: train path: subset_54/train-* - config_name: subset_55 data_files: - split: train path: subset_55/train-* - config_name: subset_56 data_files: - split: train path: subset_56/train-* - config_name: subset_57 data_files: - split: train path: subset_57/train-* - config_name: subset_58 data_files: - split: train path: subset_58/train-* - config_name: subset_59 data_files: - split: train path: subset_59/train-* - config_name: subset_6 data_files: - split: train path: subset_6/train-* - config_name: subset_60 data_files: - split: train path: subset_60/train-* - config_name: subset_61 data_files: - split: train path: subset_61/train-* - config_name: subset_62 data_files: - split: train path: subset_62/train-* - config_name: subset_63 data_files: - split: train path: subset_63/train-* - config_name: subset_64 data_files: - split: train path: subset_64/train-* - config_name: subset_65 data_files: - split: train path: subset_65/train-* - config_name: subset_66 data_files: - split: train path: subset_66/train-* - config_name: subset_67 data_files: - split: train path: subset_67/train-* - config_name: subset_69 data_files: - split: train path: subset_69/train-* - config_name: subset_7 data_files: - split: train path: subset_7/train-* - config_name: subset_70 data_files: - split: train path: subset_70/train-* - config_name: subset_71 data_files: - split: train path: subset_71/train-* - config_name: subset_72 data_files: - split: train path: subset_72/train-* - config_name: subset_73 data_files: - split: train path: subset_73/train-* - config_name: subset_74 data_files: - split: train path: subset_74/train-* - config_name: subset_75 data_files: - split: train path: subset_75/train-* - config_name: subset_76 data_files: - split: train path: subset_76/train-* - config_name: subset_77 data_files: - split: train path: subset_77/train-* - config_name: subset_78 data_files: - split: train path: subset_78/train-* - config_name: subset_79 data_files: - split: train path: subset_79/train-* - config_name: subset_8 data_files: - split: train path: subset_8/train-* - config_name: subset_80 data_files: - split: train path: subset_80/train-* - config_name: subset_81 data_files: - split: train path: subset_81/train-* - config_name: subset_82 data_files: - split: train path: subset_82/train-* - config_name: subset_83 data_files: - split: train path: subset_83/train-* - config_name: subset_84 data_files: - split: train path: subset_84/train-* - config_name: subset_85 data_files: - split: train path: subset_85/train-* - config_name: subset_86 data_files: - split: train path: subset_86/train-* - config_name: subset_87 data_files: - split: train path: subset_87/train-* - config_name: subset_88 data_files: - split: train path: subset_88/train-* - config_name: subset_9 data_files: - split: train path: subset_9/train-* - config_name: subset_90 data_files: - split: train path: subset_90/train-* - config_name: subset_91 data_files: - split: train path: subset_91/train-* - config_name: subset_92 data_files: - split: train path: subset_92/train-* - config_name: subset_93 data_files: - split: train path: subset_93/train-* - config_name: subset_94 data_files: - split: train path: subset_94/train-* - config_name: subset_95 data_files: - split: train path: subset_95/train-* - config_name: subset_96 data_files: - split: train path: subset_96/train-* - config_name: subset_97 data_files: - split: train path: subset_97/train-* - config_name: subset_98 data_files: - split: train path: subset_98/train-* - config_name: subset_99 data_files: - split: train path: subset_99/train-* --- # Dataset Card for Dataset Name <!-- Provide a quick summary of the dataset. --> This dataset card aims to be a base template for new datasets. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1). ## Dataset Details ### Dataset Description <!-- Provide a longer summary of what this dataset is. --> - **Curated by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] ### Dataset Sources [optional] <!-- Provide the basic links for the dataset. --> - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses <!-- Address questions around how the dataset is intended to be used. --> ### Direct Use <!-- This section describes suitable use cases for the dataset. --> [More Information Needed] ### Out-of-Scope Use <!-- This section addresses misuse, malicious use, and uses that the dataset will not work well for. --> [More Information Needed] ## Dataset Structure <!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. --> [More Information Needed] ## Dataset Creation ### Curation Rationale <!-- Motivation for the creation of this dataset. --> [More Information Needed] ### Source Data <!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). --> #### Data Collection and Processing <!-- This section describes the data collection and processing process such as data selection criteria, filtering and normalization methods, tools and libraries used, etc. --> [More Information Needed] #### Who are the source data producers? <!-- This section describes the people or systems who originally created the data. It should also include self-reported demographic or identity information for the source data creators if this information is available. --> [More Information Needed] ### Annotations [optional] <!-- If the dataset contains annotations which are not part of the initial data collection, use this section to describe them. --> #### Annotation process <!-- This section describes the annotation process such as annotation tools used in the process, the amount of data annotated, annotation guidelines provided to the annotators, interannotator statistics, annotation validation, etc. --> [More Information Needed] #### Who are the annotators? <!-- This section describes the people or systems who created the annotations. --> [More Information Needed] #### Personal and Sensitive Information <!-- State whether the dataset contains data that might be considered personal, sensitive, or private (e.g., data that reveals addresses, uniquely identifiable names or aliases, racial or ethnic origins, sexual orientations, religious beliefs, political opinions, financial or health data, etc.). If efforts were made to anonymize the data, describe the anonymization process. --> [More Information Needed] ## Bias, Risks, and Limitations <!-- This section is meant to convey both technical and sociotechnical limitations. --> [More Information Needed] ### Recommendations <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations. ## Citation [optional] <!-- If there is a paper or blog post introducing the dataset, the APA and Bibtex information for that should go in this section. --> **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] <!-- If relevant, include terms and calculations in this section that can help readers understand the dataset or dataset card. --> [More Information Needed] ## More Information [optional] [More Information Needed] ## Dataset Card Authors [optional] [More Information Needed] ## Dataset Card Contact [More Information Needed]

The dataset consists of multiple subsets, each containing audio data in English and Japanese, with features including line number, language ID, LASER score, audio tokens, and speaker embeddings. The dataset is primarily used for training, with each subset having different training data sizes and number of examples.
提供机构:
kotoba-speech
原始信息汇总

数据集概述

数据集配置

  • config_name: default, subset_1, subset_10, subset_100, subset_101, subset_102, subset_103, subset_104, subset_105, subset_106, subset_107, subset_108, subset_109, subset_11, subset_110, subset_111, subset_112, subset_114, subset_115, subset_116, subset_117, subset_118, subset_119, subset_12, subset_120, subset_121, subset_122, subset_123

数据集特征

  • line_no: 数据类型为int64。
  • enA.id: 数据类型为string。
  • enA.laser_score: 数据类型为float64。
  • jaA.id: 数据类型为string。
  • jaA.laser_score: 数据类型为float64。
  • jaA.audio.tokens: 序列数据类型为int64。
  • enA.audio.tokens: 序列数据类型为int64。
  • enA.audio.speaker_embedding: 序列数据类型为float32。
  • jaA.audio.speaker_embedding: 序列数据类型为float32。
  • jaA.audio.tokens.bpe_tokens: 序列数据类型为int64。
  • enA.audio.tokens.bpe_tokens: 序列数据类型为int64。

数据集分割

  • train: 每个配置中都包含训练集,具体信息如下:
    • num_bytes: 存储大小,例如subset_1为877637273字节。
    • num_examples: 样本数量,例如subset_1为2073个样本。
    • download_size: 下载大小,例如subset_1为142222083字节。
    • dataset_size: 数据集大小,例如subset_1为877637273字节。

以上信息概述了数据集的基本结构和内容,包括不同配置下的特征和数据集分割详情。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作