DigitalLearningGmbH/tatoeba_mt_parquet
收藏Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/DigitalLearningGmbH/tatoeba_mt_parquet
下载链接
链接失效反馈官方服务:
资源简介:
---
language_creators:
- crowdsourced
language:
- af
- ar
- az
- be
- bg
- bn
- br
- bs
- ca
- ch
- cs
- cv
- cy
- da
- de
- el
- en
- eo
- es
- et
- eu
- fa
- fi
- fo
- fr
- fy
- ga
- gd
- gl
- gn
- he
- hi
- hr
- hu
- hy
- ia
- id
- ie
- io
- is
- it
- ja
- jv
- ka
- kk
- km
- ko
- ku
- kw
- la
- lb
- lt
- lv
- mi
- mk
- ml
- mn
- mr
- ms
- mt
- my
- nb
- nl
- nn
- 'no'
- oc
- pl
- pt
- qu
- rn
- ro
- ru
- sh
- sl
- sq
- sr
- sv
- sw
- ta
- te
- th
- tk
- tl
- tr
- tt
- ug
- uk
- ur
- uz
- vi
- vo
- yi
- zh
license:
- cc-by-2.0
multilinguality:
- translation
pretty_name: The Tatoeba Translation Challenge
source_datasets:
- Helsinki-NLP/tatoeba_mt
task_categories:
- text-generation
- translation
dataset_info:
- config_name: afr-deu
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 118869
num_examples: 1582
download_size: 65914
dataset_size: 118869
- config_name: afr-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 112767
num_examples: 1373
- name: validation
num_bytes: 81872
num_examples: 1006
download_size: 105739
dataset_size: 194639
- config_name: afr-epo
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 94719
num_examples: 1115
download_size: 50877
dataset_size: 94719
- config_name: afr-nld
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 80656
num_examples: 1055
download_size: 44599
dataset_size: 80656
- config_name: afr-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 19782
num_examples: 227
download_size: 12448
dataset_size: 19782
- config_name: afr-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 34352
num_examples: 447
download_size: 20392
dataset_size: 34352
- config_name: ain-fin
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 16599
num_examples: 205
download_size: 8611
dataset_size: 16599
- config_name: ara-ber
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 237570
num_examples: 2316
- name: validation
num_bytes: 107328
num_examples: 1031
download_size: 172171
dataset_size: 344898
- config_name: ara-ber_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 235939
num_examples: 2298
- name: validation
num_bytes: 106592
num_examples: 1024
download_size: 170778
dataset_size: 342531
- config_name: ara-deu
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 117082
num_examples: 1208
- name: validation
num_bytes: 99845
num_examples: 1028
download_size: 117851
dataset_size: 216927
- config_name: ara-ell
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 37867
num_examples: 425
download_size: 18275
dataset_size: 37867
- config_name: ara-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1061565
num_examples: 10304
- name: validation
num_bytes: 2011932
num_examples: 19528
download_size: 1504481
dataset_size: 3073497
- config_name: ara-epo
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 68034
num_examples: 752
- name: validation
num_bytes: 8881
num_examples: 93
download_size: 41462
dataset_size: 76915
- config_name: ara-fra
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 152919
num_examples: 1568
- name: validation
num_bytes: 3876
num_examples: 35
download_size: 84050
dataset_size: 156795
- config_name: ara-heb
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 125166
num_examples: 1207
- name: validation
num_bytes: 9942
num_examples: 90
download_size: 61848
dataset_size: 135108
- config_name: ara-ita
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 21525
num_examples: 234
download_size: 14866
dataset_size: 21525
- config_name: ara-jpn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 152326
num_examples: 1335
download_size: 69625
dataset_size: 152326
- config_name: ara-jpn_Hira
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 124101
num_examples: 1091
download_size: 57183
dataset_size: 124101
- config_name: ara-pol
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 18625
num_examples: 206
download_size: 12101
dataset_size: 18625
- config_name: ara-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 411321
num_examples: 3714
- name: validation
num_bytes: 137107
num_examples: 1208
download_size: 258067
dataset_size: 548428
- config_name: ara-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 136907
num_examples: 1510
- name: validation
num_bytes: 94209
num_examples: 1030
download_size: 123987
dataset_size: 231116
- config_name: ara-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 124338
num_examples: 1262
- name: validation
num_bytes: 5192
num_examples: 51
download_size: 73967
dataset_size: 129530
- config_name: arq-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 40862
num_examples: 404
- name: validation
num_bytes: 77063
num_examples: 735
download_size: 66370
dataset_size: 117925
- config_name: avk-fra
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 122566
num_examples: 1243
- name: validation
num_bytes: 3632
num_examples: 42
download_size: 75212
dataset_size: 126198
- config_name: avk-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 21703
num_examples: 274
download_size: 14456
dataset_size: 21703
- config_name: awa-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 23967
num_examples: 278
download_size: 10739
dataset_size: 23967
- config_name: aze-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 217279
num_examples: 2658
- name: validation
num_bytes: 83428
num_examples: 1011
download_size: 152993
dataset_size: 300707
- config_name: aze-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 16435
num_examples: 209
download_size: 10124
dataset_size: 16435
- config_name: aze-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 95278
num_examples: 1149
download_size: 51761
dataset_size: 95278
- config_name: aze_Latn-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 94499
num_examples: 1140
download_size: 51214
dataset_size: 94499
- config_name: bel-deu
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 63441
num_examples: 550
download_size: 36220
dataset_size: 63441
- config_name: bel-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 273466
num_examples: 2499
- name: validation
num_bytes: 471243
num_examples: 4264
download_size: 378527
dataset_size: 744709
- config_name: bel-epo
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 84415
num_examples: 734
download_size: 45529
dataset_size: 84415
- config_name: bel-fra
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 30143
num_examples: 282
download_size: 18702
dataset_size: 30143
- config_name: bel-ita
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 25476
num_examples: 263
download_size: 14548
dataset_size: 25476
- config_name: bel-lat
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 21928
num_examples: 221
download_size: 12906
dataset_size: 21928
- config_name: bel-nld
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 69844
num_examples: 605
download_size: 39407
dataset_size: 69844
- config_name: bel-pol
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 29581
num_examples: 286
download_size: 18853
dataset_size: 29581
- config_name: bel-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 394748
num_examples: 2499
- name: validation
num_bytes: 437140
num_examples: 2753
download_size: 424908
dataset_size: 831888
- config_name: bel-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 21243
num_examples: 204
download_size: 13942
dataset_size: 21243
- config_name: bel-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 312720
num_examples: 2354
- name: validation
num_bytes: 129835
num_examples: 1020
download_size: 218883
dataset_size: 442555
- config_name: bel-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 33111
num_examples: 324
download_size: 18724
dataset_size: 33111
- config_name: ben-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 264408
num_examples: 2499
- name: validation
num_bytes: 280510
num_examples: 2647
download_size: 202248
dataset_size: 544918
- config_name: ber-deu
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 63776
num_examples: 660
download_size: 32279
dataset_size: 63776
- config_name: ber-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1028063
num_examples: 10963
- name: validation
num_bytes: 10194009
num_examples: 108421
download_size: 4199708
dataset_size: 11222072
- config_name: ber-epo
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 151437
num_examples: 1575
download_size: 67850
dataset_size: 151437
- config_name: ber-fra
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 879737
num_examples: 10143
- name: validation
num_bytes: 3004766
num_examples: 34704
download_size: 1478032
dataset_size: 3884503
- config_name: ber-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 927625
num_examples: 10015
- name: validation
num_bytes: 1299082
num_examples: 14043
download_size: 1003062
dataset_size: 2226707
- config_name: ber_Latn-deu
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 63499
num_examples: 658
download_size: 32045
dataset_size: 63499
- config_name: ber_Latn-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1027874
num_examples: 10961
- name: validation
num_bytes: 10191650
num_examples: 108399
download_size: 4199153
dataset_size: 11219524
- config_name: ber_Latn-epo
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 151277
num_examples: 1574
download_size: 67604
dataset_size: 151277
- config_name: ber_Latn-fra
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 879352
num_examples: 10139
- name: validation
num_bytes: 3004095
num_examples: 34697
download_size: 1477107
dataset_size: 3883447
- config_name: bre-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 25950
num_examples: 382
download_size: 14563
dataset_size: 25950
- config_name: bre-fra
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 175074
num_examples: 2493
- name: validation
num_bytes: 219576
num_examples: 3059
download_size: 184284
dataset_size: 394650
- config_name: bua-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 84023
num_examples: 805
download_size: 37254
dataset_size: 84023
- config_name: bua_Cyrl-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 83989
num_examples: 804
download_size: 37241
dataset_size: 83989
- config_name: bul-bul
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 26317
num_examples: 218
download_size: 13765
dataset_size: 26317
- config_name: bul-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 34830
num_examples: 277
download_size: 20329
dataset_size: 34830
- config_name: bul-deu
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 33936
num_examples: 313
download_size: 20291
dataset_size: 33936
- config_name: bul-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1084142
num_examples: 9999
- name: validation
num_bytes: 839170
num_examples: 7799
download_size: 846278
dataset_size: 1923312
- config_name: bul-epo
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 63586
num_examples: 569
download_size: 34152
dataset_size: 63586
- config_name: bul-fra
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 53950
num_examples: 445
download_size: 29941
dataset_size: 53950
- config_name: bul-ita
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 271156
num_examples: 2499
- name: validation
num_bytes: 437184
num_examples: 4005
download_size: 325323
dataset_size: 708340
- config_name: bul-jpn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 43551
num_examples: 320
download_size: 24199
dataset_size: 43551
- config_name: bul-jpn_Hira
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 36791
num_examples: 263
download_size: 20835
dataset_size: 36791
- config_name: bul-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 157433
num_examples: 1246
- name: validation
num_bytes: 123953
num_examples: 999
download_size: 132261
dataset_size: 281386
- config_name: bul-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 27878
num_examples: 285
download_size: 16599
dataset_size: 27878
- config_name: bul-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 98572
num_examples: 833
download_size: 49498
dataset_size: 98572
- config_name: bul-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 100400
num_examples: 1019
download_size: 44328
dataset_size: 100400
- config_name: bul-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 50020
num_examples: 415
download_size: 28061
dataset_size: 50020
- config_name: cat-deu
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 68540
num_examples: 722
download_size: 41879
dataset_size: 68540
- config_name: cat-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 146100
num_examples: 1630
- name: validation
num_bytes: 26029
num_examples: 227
download_size: 101766
dataset_size: 172129
- config_name: cat-epo
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 70561
num_examples: 771
download_size: 41296
dataset_size: 70561
- config_name: cat-fra
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 67131
num_examples: 699
download_size: 41209
dataset_size: 67131
- config_name: cat-ita
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 25945
num_examples: 297
download_size: 17792
dataset_size: 25945
- config_name: cat-nld
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 48725
num_examples: 577
download_size: 30139
dataset_size: 48725
- config_name: cat-por
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 74541
num_examples: 746
download_size: 44473
dataset_size: 74541
- config_name: cat-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 149550
num_examples: 1533
- name: validation
num_bytes: 130044
num_examples: 1293
download_size: 168736
dataset_size: 279594
- config_name: cat-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 40603
num_examples: 455
download_size: 22471
dataset_size: 40603
- config_name: cbk-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 128209
num_examples: 1497
- name: validation
num_bytes: 85439
num_examples: 1000
download_size: 101501
dataset_size: 213648
- config_name: ceb-deu
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 86979
num_examples: 902
download_size: 49829
dataset_size: 86979
- config_name: ceb-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 27661
num_examples: 377
download_size: 17150
dataset_size: 27661
- config_name: ces-deu
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 324334
num_examples: 3489
- name: validation
num_bytes: 107095
num_examples: 1126
download_size: 249963
dataset_size: 431429
- config_name: ces-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1208777
num_examples: 13823
- name: validation
num_bytes: 1406567
num_examples: 16188
download_size: 1371882
dataset_size: 2615344
- config_name: ces-epo
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 304288
num_examples: 3470
- name: validation
num_bytes: 90934
num_examples: 1057
download_size: 232623
dataset_size: 395222
- config_name: ces-fra
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 39317
num_examples: 437
download_size: 26267
dataset_size: 39317
- config_name: ces-hun
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 144509
num_examples: 1910
- name: validation
num_bytes: 7995
num_examples: 113
download_size: 86499
dataset_size: 152504
- config_name: ces-ita
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 98215
num_examples: 1098
download_size: 62249
dataset_size: 98215
- config_name: ces-lat
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 26405
num_examples: 322
download_size: 16842
dataset_size: 26405
- config_name: ces-pol
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 47119
num_examples: 567
download_size: 32463
dataset_size: 47119
- config_name: ces-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 289216
num_examples: 2933
- name: validation
num_bytes: 540064
num_examples: 5465
download_size: 412625
dataset_size: 829280
- config_name: ces-slv
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 34502
num_examples: 488
download_size: 22656
dataset_size: 34502
- config_name: ces-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 148657
num_examples: 1786
- name: validation
num_bytes: 94319
num_examples: 1113
download_size: 122379
dataset_size: 242976
- config_name: cha-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 14322
num_examples: 226
download_size: 8820
dataset_size: 14322
- config_name: chm-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 414650
num_examples: 2749
- name: validation
num_bytes: 145873
num_examples: 999
download_size: 280544
dataset_size: 560523
- config_name: chv-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 32634
num_examples: 335
download_size: 18901
dataset_size: 32634
- config_name: chv-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 45397
num_examples: 381
download_size: 24053
dataset_size: 45397
- config_name: chv-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 50662
num_examples: 507
download_size: 27156
dataset_size: 50662
- config_name: cmn_Hans-wuu
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 83509
num_examples: 814
- name: validation
num_bytes: 184233
num_examples: 1833
download_size: 160147
dataset_size: 267742
- config_name: cor-deu
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 57500
num_examples: 820
download_size: 25536
dataset_size: 57500
- config_name: cor-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 202063
num_examples: 3197
- name: validation
num_bytes: 63311
num_examples: 999
download_size: 104772
dataset_size: 265374
- config_name: cor-epo
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 44825
num_examples: 662
download_size: 20025
dataset_size: 44825
- config_name: cor-fra
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 38316
num_examples: 554
download_size: 18077
dataset_size: 38316
- config_name: cor-ita
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 17926
num_examples: 286
download_size: 9237
dataset_size: 17926
- config_name: cor-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 17034
num_examples: 217
download_size: 9056
dataset_size: 17034
- config_name: cor-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 13929
num_examples: 205
download_size: 8528
dataset_size: 13929
- config_name: crh-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 16909
num_examples: 207
download_size: 10899
dataset_size: 16909
- config_name: cym-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 67303
num_examples: 817
download_size: 36462
dataset_size: 67303
- config_name: dan-dan
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 20508
num_examples: 211
download_size: 13056
dataset_size: 20508
- config_name: dan-deu
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 939346
num_examples: 9997
- name: validation
num_bytes: 653108
num_examples: 6920
download_size: 822653
dataset_size: 1592454
- config_name: dan-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 937157
num_examples: 10794
- name: validation
num_bytes: 1731687
num_examples: 20088
download_size: 1309642
dataset_size: 2668844
- config_name: dan-fin
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 210769
num_examples: 2664
- name: validation
num_bytes: 146073
num_examples: 1742
download_size: 139564
dataset_size: 356842
- config_name: dan-fra
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 141621
num_examples: 1730
- name: validation
num_bytes: 5094
num_examples: 44
download_size: 78641
dataset_size: 146715
- config_name: dan-ita
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 27065
num_examples: 283
download_size: 17488
dataset_size: 27065
- config_name: dan-jpn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 85475
num_examples: 903
download_size: 40686
dataset_size: 85475
- config_name: dan-jpn_Hira
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 68510
num_examples: 715
download_size: 32860
dataset_size: 68510
- config_name: dan-nld
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 128850
num_examples: 1642
- name: validation
num_bytes: 2811
num_examples: 35
download_size: 73101
dataset_size: 131661
- config_name: dan-nob
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 117135
num_examples: 1298
- name: validation
num_bytes: 91147
num_examples: 1011
download_size: 120282
dataset_size: 208282
- config_name: dan-nor
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 117924
num_examples: 1310
- name: validation
num_bytes: 91819
num_examples: 1016
download_size: 121240
dataset_size: 209743
- config_name: dan-por
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 65095
num_examples: 872
download_size: 34791
dataset_size: 65095
- config_name: dan-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 176873
num_examples: 1712
- name: validation
num_bytes: 104636
num_examples: 1019
download_size: 143203
dataset_size: 281509
- config_name: dan-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 437841
num_examples: 4999
- name: validation
num_bytes: 451220
num_examples: 5135
download_size: 462053
dataset_size: 889061
- config_name: dan-swe
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 125774
num_examples: 1548
- name: validation
num_bytes: 115669
num_examples: 1439
download_size: 129907
dataset_size: 241443
- config_name: dan-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 53781
num_examples: 757
download_size: 30139
dataset_size: 53781
- config_name: deu-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 258851
num_examples: 2608
- name: validation
num_bytes: 437881
num_examples: 4365
download_size: 369151
dataset_size: 696732
- config_name: deu-cmn_Hant
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 107987
num_examples: 1241
- name: validation
num_bytes: 199067
num_examples: 2250
download_size: 159365
dataset_size: 307054
- config_name: deu-deu
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 267826
num_examples: 2499
- name: validation
num_bytes: 223119
num_examples: 2114
download_size: 259164
dataset_size: 490945
- config_name: deu-dsb
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 55716
num_examples: 639
download_size: 32913
dataset_size: 55716
- config_name: ces-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 153794
num_examples: 1462
- name: validation
num_bytes: 10429
num_examples: 133
download_size: 94349
dataset_size: 164223
- config_name: dan-epo
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 975819
num_examples: 11068
- name: validation
num_bytes: 1706235
num_examples: 19221
download_size: 1303739
dataset_size: 2682054
- config_name: deu-ell
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 268771
num_examples: 2499
- name: validation
num_bytes: 306772
num_examples: 2800
download_size: 278036
dataset_size: 575543
- config_name: deu-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1805627
num_examples: 17564
- name: validation
num_bytes: 28610541
num_examples: 289748
download_size: 13423417
dataset_size: 30416168
- config_name: deu-est
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 19903
num_examples: 243
download_size: 13037
dataset_size: 19903
- config_name: deu-eus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 41039
num_examples: 455
download_size: 25696
dataset_size: 41039
- config_name: deu-fas
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 400446
num_examples: 3184
- name: validation
num_bytes: 129569
num_examples: 1024
download_size: 276027
dataset_size: 530015
- config_name: deu-fin
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 239646
num_examples: 2646
- name: validation
num_bytes: 642456
num_examples: 7141
download_size: 462553
dataset_size: 882102
- config_name: deu-fra
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1281879
num_examples: 12417
- name: validation
num_bytes: 10120004
num_examples: 98157
download_size: 5431932
dataset_size: 11401883
- config_name: deu-frr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 22655
num_examples: 277
download_size: 15560
dataset_size: 22655
- config_name: deu-gos
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 14640
num_examples: 206
download_size: 10515
dataset_size: 14640
- config_name: deu-hbs
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 193417
num_examples: 1958
download_size: 109664
dataset_size: 193417
- config_name: deu-heb
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 342033
num_examples: 3089
- name: validation
num_bytes: 125730
num_examples: 1124
download_size: 225334
dataset_size: 467763
- config_name: deu-hrv
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 67567
num_examples: 781
download_size: 40162
dataset_size: 67567
- config_name: deu-hrx
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 37662
num_examples: 470
download_size: 20237
dataset_size: 37662
- config_name: deu-hsb
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 58625
num_examples: 665
download_size: 34827
dataset_size: 58625
- config_name: deu-hun
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1534998
num_examples: 15341
- name: validation
num_bytes: 5346859
num_examples: 54082
download_size: 3523593
dataset_size: 6881857
- config_name: deu-ido
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 69192
num_examples: 870
- name: validation
num_bytes: 7031
num_examples: 95
download_size: 40976
dataset_size: 76223
- config_name: deu-ile
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 160935
num_examples: 2002
- name: validation
num_bytes: 107825
num_examples: 1371
download_size: 132108
dataset_size: 268760
- config_name: deu-ina
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 152094
num_examples: 1256
- name: validation
num_bytes: 9464
num_examples: 114
download_size: 90647
dataset_size: 161558
- config_name: deu-ind
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 47760
num_examples: 496
download_size: 28126
dataset_size: 47760
- config_name: deu-isl
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 79958
num_examples: 968
download_size: 42419
dataset_size: 79958
- config_name: deu-ita
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 992180
num_examples: 10093
- name: validation
num_bytes: 1196094
num_examples: 12197
download_size: 1126832
dataset_size: 2188274
- config_name: deu-jbo
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 131489
num_examples: 1448
download_size: 65705
dataset_size: 131489
- config_name: deu-jpn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1370543
num_examples: 11427
- name: validation
num_bytes: 3591001
num_examples: 29867
download_size: 2331243
dataset_size: 4961544
- config_name: deu-jpn_Hani
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 144613
num_examples: 1289
- name: validation
num_bytes: 399245
num_examples: 3631
download_size: 279320
dataset_size: 543858
- config_name: deu-jpn_Hira
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1187608
num_examples: 9809
- name: validation
num_bytes: 3092140
num_examples: 25378
download_size: 1993699
dataset_size: 4279748
- config_name: deu-jpn_Kana
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 37111
num_examples: 320
- name: validation
num_bytes: 96454
num_examples: 829
download_size: 68455
dataset_size: 133565
- config_name: deu-kab
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 30016
num_examples: 372
download_size: 17845
dataset_size: 30016
- config_name: deu-kor
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 123242
num_examples: 1103
download_size: 68033
dataset_size: 123242
- config_name: deu-kor_Hang
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 120181
num_examples: 1069
download_size: 66892
dataset_size: 120181
- config_name: deu-kur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 18833
num_examples: 236
download_size: 11349
dataset_size: 18833
- config_name: deu-kur_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 17483
num_examples: 222
download_size: 10468
dataset_size: 17483
- config_name: deu-lad
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 17057
num_examples: 219
- name: validation
num_bytes: 2253
num_examples: 27
download_size: 12097
dataset_size: 19310
- config_name: deu-lat
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 166757
num_examples: 2015
- name: validation
num_bytes: 94276
num_examples: 1101
download_size: 135876
dataset_size: 261033
- config_name: deu-lfn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 48086
num_examples: 417
- name: validation
num_bytes: 5806
num_examples: 53
download_size: 30628
dataset_size: 53892
- config_name: deu-lfn_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 29221
num_examples: 290
- name: validation
num_bytes: 2821
num_examples: 31
download_size: 22388
dataset_size: 32042
- config_name: deu-lit
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 105831
num_examples: 1114
- name: validation
num_bytes: 29448
num_examples: 354
download_size: 78897
dataset_size: 135279
- config_name: deu-ltz
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 27871
num_examples: 346
download_size: 14535
dataset_size: 27871
- config_name: deu-msa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 53011
num_examples: 543
download_size: 31039
dataset_size: 53011
- config_name: deu-nds
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 916195
num_examples: 9998
- name: validation
num_bytes: 724703
num_examples: 7842
download_size: 846482
dataset_size: 1640898
- config_name: deu-nld
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 934152
num_examples: 10217
- name: validation
num_bytes: 2407739
num_examples: 26386
download_size: 1674495
dataset_size: 3341891
- config_name: deu-nob
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 405671
num_examples: 3524
- name: validation
num_bytes: 109248
num_examples: 965
download_size: 283830
dataset_size: 514919
- config_name: deu-nor
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 417281
num_examples: 3650
- name: validation
num_bytes: 112350
num_examples: 1000
download_size: 292704
dataset_size: 529631
- config_name: deu-pol
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 465291
num_examples: 4999
- name: validation
num_bytes: 529225
num_examples: 5700
download_size: 553141
dataset_size: 994516
- config_name: deu-por
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1013946
num_examples: 9999
- name: validation
num_bytes: 736203
num_examples: 7044
download_size: 924126
dataset_size: 1750149
- config_name: deu-ron
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 98189
num_examples: 1140
- name: validation
num_bytes: 86238
num_examples: 1012
download_size: 105500
dataset_size: 184427
- config_name: deu-run
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 121394
num_examples: 1751
- name: validation
num_bytes: 88582
num_examples: 1190
download_size: 102436
dataset_size: 209976
- config_name: deu-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1499216
num_examples: 12799
- name: validation
num_bytes: 11573090
num_examples: 100273
download_size: 5403337
dataset_size: 13072306
- config_name: deu-slv
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 35983
num_examples: 491
download_size: 21747
dataset_size: 35983
- config_name: deu-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1056787
num_examples: 10520
- name: validation
num_bytes: 7477370
num_examples: 74985
download_size: 4207104
dataset_size: 8534157
- config_name: deu-srp_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 105768
num_examples: 985
download_size: 61261
dataset_size: 105768
- config_name: deu-swe
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 295272
num_examples: 3409
- name: validation
num_bytes: 97264
num_examples: 1125
download_size: 217263
dataset_size: 392536
- config_name: deu-swg
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 184947
num_examples: 1522
- name: validation
num_bytes: 17972
num_examples: 161
download_size: 127693
dataset_size: 202919
- config_name: deu-tat
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 401633
num_examples: 2500
- name: validation
num_bytes: 552680
num_examples: 3464
download_size: 517051
dataset_size: 954313
- config_name: deu-tgl
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 32289
num_examples: 325
download_size: 19727
dataset_size: 32289
- config_name: deu-tlh
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 91283
num_examples: 1098
- name: validation
num_bytes: 84607
num_examples: 1034
download_size: 93834
dataset_size: 175890
- config_name: deu-toki
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1066355
num_examples: 10113
- name: validation
num_bytes: 1443886
num_examples: 13816
download_size: 1089456
dataset_size: 2510241
- config_name: deu-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 469878
num_examples: 4999
- name: validation
num_bytes: 681055
num_examples: 7276
download_size: 610454
dataset_size: 1150933
- config_name: deu-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 988007
num_examples: 10318
- name: validation
num_bytes: 1112017
num_examples: 11720
download_size: 922708
dataset_size: 2100024
- config_name: deu-vie
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 42346
num_examples: 313
download_size: 26799
dataset_size: 42346
- config_name: deu-vol
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 16426
num_examples: 205
download_size: 10419
dataset_size: 16426
- config_name: deu-yid
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 89530
num_examples: 852
- name: validation
num_bytes: 14302
num_examples: 120
download_size: 48347
dataset_size: 103832
- config_name: deu-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 374528
num_examples: 3943
- name: validation
num_bytes: 656083
num_examples: 6837
download_size: 536791
dataset_size: 1030611
- config_name: dsb-hsb
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 19291
num_examples: 330
download_size: 10518
dataset_size: 19291
- config_name: dsb-slv
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 20142
num_examples: 329
download_size: 10459
dataset_size: 20142
- config_name: dtp-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 164362
num_examples: 1926
- name: validation
num_bytes: 89064
num_examples: 1011
download_size: 143844
dataset_size: 253426
- config_name: dtp-jpn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 24238
num_examples: 250
download_size: 15028
dataset_size: 24238
- config_name: dtp-jpn_Hira
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 22811
num_examples: 235
download_size: 14286
dataset_size: 22811
- config_name: dtp-msa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 59791
num_examples: 516
download_size: 34102
dataset_size: 59791
- config_name: dtp-zsm_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 58996
num_examples: 508
download_size: 33799
dataset_size: 58996
- config_name: egl-ita
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 13042
num_examples: 201
download_size: 7794
dataset_size: 13042
- config_name: ell-ell
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 64052
num_examples: 532
download_size: 27002
dataset_size: 64052
- config_name: ell-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1093958
num_examples: 10898
- name: validation
num_bytes: 1252537
num_examples: 12919
download_size: 921557
dataset_size: 2346495
- config_name: ell-epo
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 65871
num_examples: 603
download_size: 35659
dataset_size: 65871
- config_name: ell-fra
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 153819
num_examples: 1505
download_size: 73106
dataset_size: 153819
- config_name: ell-ita
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 41030
num_examples: 423
download_size: 20769
dataset_size: 41030
- config_name: ell-nld
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 132368
num_examples: 921
- name: validation
num_bytes: 9869
num_examples: 66
download_size: 80141
dataset_size: 142237
- config_name: ell-por
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 82594
num_examples: 884
- name: validation
num_bytes: 2043
num_examples: 22
download_size: 41322
dataset_size: 84637
- config_name: ell-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 312405
num_examples: 2499
- name: validation
num_bytes: 315095
num_examples: 2531
download_size: 287204
dataset_size: 627500
- config_name: ell-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 177528
num_examples: 1828
- name: validation
num_bytes: 97334
num_examples: 1004
download_size: 136484
dataset_size: 274862
- config_name: ell-swe
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 24701
num_examples: 252
download_size: 13735
dataset_size: 24701
- config_name: ell-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 150047
num_examples: 1468
- name: validation
num_bytes: 10316
num_examples: 108
download_size: 79339
dataset_size: 160363
- config_name: eng-bos_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 22596
num_examples: 300
- name: validation
num_bytes: 14959
num_examples: 199
download_size: 24366
dataset_size: 37555
- config_name: eng-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 438854
num_examples: 4449
- name: validation
num_bytes: 1777020
num_examples: 17966
download_size: 1162711
dataset_size: 2215874
- config_name: eng-cmn_Hant
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 414294
num_examples: 4475
- name: validation
num_bytes: 1812557
num_examples: 19463
download_size: 1128183
dataset_size: 2226851
- config_name: eng-eng
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1269164
num_examples: 12061
- name: validation
num_bytes: 10195524
num_examples: 96607
download_size: 3516603
dataset_size: 11464688
- config_name: eng-est
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 104016
num_examples: 1358
- name: validation
num_bytes: 85232
num_examples: 1095
download_size: 110161
dataset_size: 189248
- config_name: eng-eus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 96502
num_examples: 1059
- name: validation
num_bytes: 91908
num_examples: 1000
download_size: 107230
dataset_size: 188410
- config_name: eng-fao
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 23587
num_examples: 293
download_size: 15976
dataset_size: 23587
- config_name: eng-fas
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 455271
num_examples: 3761
- name: validation
num_bytes: 124374
num_examples: 1030
download_size: 300492
dataset_size: 579645
- config_name: eng-fin
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 971336
num_examples: 10689
- name: validation
num_bytes: 6265583
num_examples: 69895
download_size: 3296109
dataset_size: 7236919
- config_name: eng-fra
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1257949
num_examples: 12680
- name: validation
num_bytes: 24063290
num_examples: 251748
download_size: 10576625
dataset_size: 25321239
- config_name: eng-fry
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 18261
num_examples: 219
download_size: 12827
dataset_size: 18261
- config_name: eng-gla
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 80003
num_examples: 954
download_size: 39209
dataset_size: 80003
- config_name: eng-gle
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 141399
num_examples: 1912
- name: validation
num_bytes: 2295
num_examples: 26
download_size: 73403
dataset_size: 143694
- config_name: eng-glg
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 97097
num_examples: 1014
download_size: 55991
dataset_size: 97097
- config_name: eng-gos
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 70177
num_examples: 1153
- name: validation
num_bytes: 7364
num_examples: 95
download_size: 41819
dataset_size: 77541
- config_name: eng-got
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 22693
num_examples: 201
download_size: 11634
dataset_size: 22693
- config_name: eng-grc
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 77943
num_examples: 613
download_size: 41733
dataset_size: 77943
- config_name: eng-gsw
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 12398
num_examples: 204
download_size: 8014
dataset_size: 12398
- config_name: eng-hbs
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 877077
num_examples: 10016
- name: validation
num_bytes: 1437995
num_examples: 14205
download_size: 970160
dataset_size: 2315072
- config_name: eng-heb
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1029187
num_examples: 10518
- name: validation
num_bytes: 15016495
num_examples: 153502
download_size: 6534814
dataset_size: 16045682
- config_name: eng-hin
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 632165
num_examples: 4999
- name: validation
num_bytes: 750572
num_examples: 5943
download_size: 549131
dataset_size: 1382737
- config_name: eng-hoc
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 44534
num_examples: 659
download_size: 16568
dataset_size: 44534
- config_name: eng-hoc_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 44505
num_examples: 658
download_size: 16594
dataset_size: 44505
- config_name: eng-hrv
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 122085
num_examples: 1479
- name: validation
num_bytes: 79066
num_examples: 948
download_size: 119777
dataset_size: 201151
- config_name: eng-hrx
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 16515
num_examples: 220
download_size: 10338
dataset_size: 16515
- config_name: eng-hun
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1115409
num_examples: 13036
- name: validation
num_bytes: 8114616
num_examples: 97143
download_size: 4410369
dataset_size: 9230025
- config_name: eng-hye
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 86763
num_examples: 1120
- name: validation
num_bytes: 78516
num_examples: 999
download_size: 75105
dataset_size: 165279
- config_name: eng-ido
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 172535
num_examples: 1959
- name: validation
num_bytes: 126935
num_examples: 1482
download_size: 157198
dataset_size: 299470
- config_name: eng-ido_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 172449
num_examples: 1958
download_size: 90120
dataset_size: 172449
- config_name: eng-ile
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 132999
num_examples: 1710
- name: validation
num_bytes: 86077
num_examples: 1107
download_size: 112481
dataset_size: 219076
- config_name: eng-ilo
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 91354
num_examples: 1092
- name: validation
num_bytes: 83565
num_examples: 999
download_size: 97325
dataset_size: 174919
- config_name: eng-ina
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 533554
num_examples: 4987
- name: validation
num_bytes: 649383
num_examples: 6233
download_size: 603107
dataset_size: 1182937
- config_name: eng-ind
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 380922
num_examples: 4288
- name: validation
num_bytes: 520374
num_examples: 5808
download_size: 461005
dataset_size: 901296
- config_name: eng-isl
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 240876
num_examples: 2502
- name: validation
num_bytes: 662112
num_examples: 6937
download_size: 467385
dataset_size: 902988
- config_name: eng-ita
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1441207
num_examples: 17319
- name: validation
num_bytes: 38409070
num_examples: 472973
download_size: 11676126
dataset_size: 39850277
- config_name: eng-jav
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 22298
num_examples: 261
download_size: 14199
dataset_size: 22298
- config_name: eng-jbo
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 423796
num_examples: 4995
- name: validation
num_bytes: 593343
num_examples: 6939
download_size: 498266
dataset_size: 1017139
- config_name: eng-jbo_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 423519
num_examples: 4991
- name: validation
num_bytes: 593200
num_examples: 6936
download_size: 497925
dataset_size: 1016719
- config_name: eng-jpn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1657514
num_examples: 13861
- name: validation
num_bytes: 23343978
num_examples: 194063
download_size: 10922866
dataset_size: 25001492
- config_name: eng-jpn_Hani
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 165303
num_examples: 1499
- name: validation
num_bytes: 2221340
num_examples: 19904
download_size: 1156474
dataset_size: 2386643
- config_name: eng-jpn_Hira
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1431624
num_examples: 11830
- name: validation
num_bytes: 20729481
num_examples: 170845
download_size: 9553590
dataset_size: 22161105
- config_name: eng-jpn_Kana
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 57468
num_examples: 502
- name: validation
num_bytes: 383407
num_examples: 3228
download_size: 214818
dataset_size: 440875
- config_name: eng-kab
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 958325
num_examples: 12141
- name: validation
num_bytes: 1099879
num_examples: 14438
download_size: 898677
dataset_size: 2058204
- config_name: eng-kat
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 106255
num_examples: 983
download_size: 42929
dataset_size: 106255
- config_name: eng-kaz
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 42142
num_examples: 402
download_size: 22786
dataset_size: 42142
- config_name: eng-kaz_Cyrl
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 41491
num_examples: 395
download_size: 22306
dataset_size: 41491
- config_name: eng-kha
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 99741
num_examples: 1313
- name: validation
num_bytes: 1779
num_examples: 19
download_size: 50322
dataset_size: 101520
- config_name: eng-khm
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 79812
num_examples: 725
download_size: 33939
dataset_size: 79812
- config_name: eng-kor
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 236727
num_examples: 2399
- name: validation
num_bytes: 104433
num_examples: 1040
download_size: 179764
dataset_size: 341160
- config_name: eng-kor_Hang
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 234873
num_examples: 2375
- name: validation
num_bytes: 103713
num_examples: 1031
download_size: 178345
dataset_size: 338586
- config_name: eng-kur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 36063
num_examples: 441
- name: validation
num_bytes: 6689
num_examples: 79
download_size: 26817
dataset_size: 42752
- config_name: eng-kur_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 22071
num_examples: 289
download_size: 13276
dataset_size: 22071
- config_name: eng-kzj
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 121039
num_examples: 1168
download_size: 67990
dataset_size: 121039
- config_name: eng-lad
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 57498
num_examples: 767
- name: validation
num_bytes: 4378
num_examples: 57
download_size: 26724
dataset_size: 61876
- config_name: eng-lad_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 48604
num_examples: 671
- name: validation
num_bytes: 2554
num_examples: 36
download_size: 23543
dataset_size: 51158
- config_name: eng-lat
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1089275
num_examples: 10193
- name: validation
num_bytes: 1492502
num_examples: 14001
download_size: 1336363
dataset_size: 2581777
- config_name: eng-lav
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 133637
num_examples: 1630
download_size: 75998
dataset_size: 133637
- config_name: eng-lfn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 302601
num_examples: 3296
- name: validation
num_bytes: 378841
num_examples: 3705
download_size: 326423
dataset_size: 681442
- config_name: eng-lfn_Cyrl
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 89601
num_examples: 846
- name: validation
num_bytes: 150969
num_examples: 1219
download_size: 111777
dataset_size: 240570
- config_name: eng-lfn_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 212876
num_examples: 2449
- name: validation
num_bytes: 227779
num_examples: 2485
download_size: 219777
dataset_size: 440655
- config_name: eng-lit
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 210416
num_examples: 2527
- name: validation
num_bytes: 468022
num_examples: 5642
download_size: 358493
dataset_size: 678438
- config_name: eng-ltz
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 22109
num_examples: 292
download_size: 12793
dataset_size: 22109
- config_name: eng-mal
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 122845
num_examples: 801
download_size: 52038
dataset_size: 122845
- config_name: eng-mar
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1263570
num_examples: 10395
- name: validation
num_bytes: 5203641
num_examples: 43057
download_size: 2177593
dataset_size: 6467211
- config_name: eng-mkd
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 983537
num_examples: 10009
- name: validation
num_bytes: 6916962
num_examples: 70318
download_size: 3252466
dataset_size: 7900499
- config_name: eng-mlt
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 14036
num_examples: 202
download_size: 9489
dataset_size: 14036
- config_name: eng-mon
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 51132
num_examples: 414
- name: validation
num_bytes: 2954
num_examples: 22
download_size: 32987
dataset_size: 54086
- config_name: eng-mri
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 35222
num_examples: 365
download_size: 21751
dataset_size: 35222
- config_name: eng-msa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 452195
num_examples: 4999
- name: validation
num_bytes: 633153
num_examples: 6892
download_size: 557473
dataset_size: 1085348
- config_name: eng-mya
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 36681
num_examples: 215
download_size: 15537
dataset_size: 36681
- config_name: eng-nds
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 208894
num_examples: 2499
- name: validation
num_bytes: 270713
num_examples: 3220
download_size: 259707
dataset_size: 479607
- config_name: eng-nld
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1088704
num_examples: 12695
- name: validation
num_bytes: 5293182
num_examples: 62386
download_size: 2971500
dataset_size: 6381886
- config_name: eng-nno
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 40301
num_examples: 459
- name: validation
num_bytes: 44105
num_examples: 504
download_size: 52863
dataset_size: 84406
- config_name: eng-nob
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 424638
num_examples: 4538
- name: validation
num_bytes: 482450
num_examples: 5201
download_size: 494770
dataset_size: 907088
- config_name: eng-nor
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 465177
num_examples: 4999
- name: validation
num_bytes: 526696
num_examples: 5706
download_size: 541002
dataset_size: 991873
- config_name: eng-nov
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 17413
num_examples: 216
download_size: 11444
dataset_size: 17413
- config_name: eng-nst
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 70980
num_examples: 804
download_size: 29710
dataset_size: 70980
- config_name: eng-oci
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 64323
num_examples: 840
download_size: 35159
dataset_size: 64323
- config_name: eng-orv
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 30357
num_examples: 321
download_size: 15683
dataset_size: 30357
- config_name: eng-ota
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 66860
num_examples: 687
download_size: 27433
dataset_size: 66860
- config_name: eng-ota_Arab
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 38076
num_examples: 370
download_size: 19282
dataset_size: 38076
- config_name: eng-ota_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 28693
num_examples: 316
download_size: 16607
dataset_size: 28693
- config_name: eng-pam
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 86922
num_examples: 999
- name: validation
num_bytes: 43649
num_examples: 493
download_size: 67979
dataset_size: 130571
- config_name: eng-pes
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 454026
num_examples: 3756
download_size: 232007
dataset_size: 454026
- config_name: eng-pms
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 24232
num_examples: 268
download_size: 16311
dataset_size: 24232
- config_name: eng-pol
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 901788
num_examples: 10098
- name: validation
num_bytes: 3941704
num_examples: 44188
download_size: 2447148
dataset_size: 4843492
- config_name: eng-por
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1243469
num_examples: 13221
- name: validation
num_bytes: 18217294
num_examples: 204461
download_size: 8127961
dataset_size: 19460763
- config_name: eng-prg
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 21520
num_examples: 212
download_size: 14491
dataset_size: 21520
- config_name: eng-que
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 18477
num_examples: 256
download_size: 11724
dataset_size: 18477
- config_name: eng-rom
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 56689
num_examples: 705
download_size: 23862
dataset_size: 56689
- config_name: eng-ron
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 495218
num_examples: 5507
- name: validation
num_bytes: 893923
num_examples: 9660
download_size: 752574
dataset_size: 1389141
- config_name: eng-run
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 110686
num_examples: 1702
download_size: 47908
dataset_size: 110686
- config_name: eng-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 2180204
num_examples: 19424
- name: validation
num_bytes: 54715027
num_examples: 504173
download_size: 19305124
dataset_size: 56895231
- config_name: eng-slv
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 196477
num_examples: 2494
- name: validation
num_bytes: 129115
num_examples: 1609
download_size: 187799
dataset_size: 325592
- config_name: eng-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1619357
num_examples: 16582
- name: validation
num_bytes: 18576364
num_examples: 197298
download_size: 9383030
dataset_size: 20195721
- config_name: eng-sqi
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 98881
num_examples: 1108
download_size: 57489
dataset_size: 98881
- config_name: eng-srp_Cyrl
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 155339
num_examples: 1579
- name: validation
num_bytes: 981407
num_examples: 8816
download_size: 399356
dataset_size: 1136746
- config_name: eng-srp_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 576558
num_examples: 6655
- name: validation
num_bytes: 362157
num_examples: 4239
download_size: 438125
dataset_size: 938715
- config_name: eng-swa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 27342
num_examples: 386
download_size: 15781
dataset_size: 27342
- config_name: eng-swe
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 821251
num_examples: 10361
- name: validation
num_bytes: 1230707
num_examples: 15557
download_size: 994551
dataset_size: 2051958
- config_name: eng-tam
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 45497
num_examples: 310
download_size: 20885
dataset_size: 45497
- config_name: eng-tat
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 161072
num_examples: 1450
download_size: 80076
dataset_size: 161072
- config_name: eng-tel
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 31925
num_examples: 260
download_size: 16355
dataset_size: 31925
- config_name: eng-tgl
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 231824
num_examples: 2499
- name: validation
num_bytes: 445138
num_examples: 4795
download_size: 345932
dataset_size: 676962
- config_name: eng-tha
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 134201
num_examples: 1153
download_size: 60955
dataset_size: 134201
- config_name: eng-tlh
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 381052
num_examples: 4999
- name: validation
num_bytes: 641737
num_examples: 8419
download_size: 491865
dataset_size: 1022789
- config_name: eng-toki
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 461097
num_examples: 4989
- name: validation
num_bytes: 813506
num_examples: 8701
download_size: 529925
dataset_size: 1274603
- config_name: eng-tuk
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 243317
num_examples: 2499
- name: validation
num_bytes: 379489
num_examples: 3865
download_size: 307706
dataset_size: 622806
- config_name: eng-tuk_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 243289
num_examples: 2498
download_size: 123509
dataset_size: 243289
- config_name: eng-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1298131
num_examples: 13741
- name: validation
num_bytes: 60806843
num_examples: 658174
download_size: 25564718
dataset_size: 62104974
- config_name: eng-tzl
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 12738
num_examples: 200
- name: validation
num_bytes: 1495
num_examples: 19
download_size: 10240
dataset_size: 14233
- config_name: eng-tzl_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 12706
num_examples: 199
download_size: 7751
dataset_size: 12706
- config_name: eng-uig
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 350875
num_examples: 3023
- name: validation
num_bytes: 117229
num_examples: 1005
download_size: 207212
dataset_size: 468104
- config_name: eng-uig_Arab
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 350480
num_examples: 3020
- name: validation
num_bytes: 117153
num_examples: 1004
download_size: 206993
dataset_size: 467633
- config_name: eng-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1323157
num_examples: 13126
- name: validation
num_bytes: 15731337
num_examples: 159486
download_size: 5968292
dataset_size: 17054494
- config_name: eng-urd
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 180640
num_examples: 1662
download_size: 81930
dataset_size: 180640
- config_name: eng-uzb
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 36607
num_examples: 457
download_size: 18767
dataset_size: 36607
- config_name: eng-uzb_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 21475
num_examples: 300
download_size: 12341
dataset_size: 21475
- config_name: eng-vie
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 264072
num_examples: 2499
- name: validation
num_bytes: 330959
num_examples: 2742
download_size: 307514
dataset_size: 595031
- config_name: eng-vol
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 118924
num_examples: 1540
- name: validation
num_bytes: 95967
num_examples: 1257
download_size: 105856
dataset_size: 214891
- config_name: eng-war
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 135969
num_examples: 1511
download_size: 71531
dataset_size: 135969
- config_name: eng-xal
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 28327
num_examples: 280
download_size: 17604
dataset_size: 28327
- config_name: eng-yid
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 261728
num_examples: 2482
- name: validation
num_bytes: 202614
num_examples: 1891
download_size: 194772
dataset_size: 464342
- config_name: eng-yue_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 71707
num_examples: 676
- name: validation
num_bytes: 291333
num_examples: 2719
download_size: 201895
dataset_size: 363040
- config_name: eng-yue_Hant
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 40508
num_examples: 453
- name: validation
num_bytes: 131436
num_examples: 1521
download_size: 93730
dataset_size: 171944
- config_name: eng-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 996210
num_examples: 10389
- name: validation
num_bytes: 4138758
num_examples: 43074
download_size: 2665689
dataset_size: 5134968
- config_name: eng-zsm_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 55659
num_examples: 535
- name: validation
num_bytes: 91235
num_examples: 844
download_size: 83025
dataset_size: 146894
- config_name: eng-zza
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 35981
num_examples: 528
download_size: 21157
dataset_size: 35981
- config_name: epo-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 128088
num_examples: 1403
- name: validation
num_bytes: 55397
num_examples: 607
download_size: 103066
dataset_size: 183485
- config_name: epo-cmn_Hant
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 70643
num_examples: 830
- name: validation
num_bytes: 34256
num_examples: 402
download_size: 59759
dataset_size: 104899
- config_name: epo-epo
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 917005
num_examples: 9999
- name: validation
num_bytes: 896691
num_examples: 9675
download_size: 927651
dataset_size: 1813696
- config_name: epo-fas
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 298965
num_examples: 2513
- name: validation
num_bytes: 874743
num_examples: 7429
download_size: 559082
dataset_size: 1173708
- config_name: epo-fin
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 240896
num_examples: 2861
- name: validation
num_bytes: 87997
num_examples: 1046
download_size: 166693
dataset_size: 328893
- config_name: epo-fra
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1230182
num_examples: 12167
- name: validation
num_bytes: 28854303
num_examples: 287076
download_size: 13338711
dataset_size: 30084485
- config_name: epo-glg
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 28349
num_examples: 328
download_size: 19455
dataset_size: 28349
- config_name: epo-hbs
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 288245
num_examples: 2499
- name: validation
num_bytes: 321762
num_examples: 2771
download_size: 319944
dataset_size: 610007
- config_name: epo-heb
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1112244
num_examples: 10367
- name: validation
num_bytes: 2115699
num_examples: 19606
download_size: 1476715
dataset_size: 3227943
- config_name: epo-hrv
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 38256
num_examples: 422
- name: validation
num_bytes: 41597
num_examples: 474
download_size: 53172
dataset_size: 79853
- config_name: epo-hun
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 964243
num_examples: 10093
- name: validation
num_bytes: 2750708
num_examples: 28754
download_size: 1977781
dataset_size: 3714951
- config_name: epo-ido
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 101408
num_examples: 1182
- name: validation
num_bytes: 31558
num_examples: 416
download_size: 69083
dataset_size: 132966
- config_name: epo-ile
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 23197
num_examples: 315
- name: validation
num_bytes: 1556
num_examples: 22
download_size: 16768
dataset_size: 24753
- config_name: epo-ile_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 23116
num_examples: 314
download_size: 13913
dataset_size: 23116
- config_name: epo-ina
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 282400
num_examples: 2643
- name: validation
num_bytes: 143460
num_examples: 1447
download_size: 222582
dataset_size: 425860
- config_name: epo-isl
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 19877
num_examples: 232
download_size: 12758
dataset_size: 19877
- config_name: epo-ita
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 888636
num_examples: 10302
- name: validation
num_bytes: 3453735
num_examples: 40618
download_size: 1983283
dataset_size: 4342371
- config_name: epo-jbo
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 97045
num_examples: 1166
download_size: 48311
dataset_size: 97045
- config_name: epo-jpn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 376590
num_examples: 3476
- name: validation
num_bytes: 690918
num_examples: 6411
download_size: 505968
dataset_size: 1067508
- config_name: epo-jpn_Hani
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 35103
num_examples: 327
- name: validation
num_bytes: 65995
num_examples: 657
download_size: 56387
dataset_size: 101098
- config_name: epo-jpn_Hira
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 325335
num_examples: 2985
- name: validation
num_bytes: 605714
num_examples: 5574
download_size: 437358
dataset_size: 931049
- config_name: epo-lad
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 29891
num_examples: 365
- name: validation
num_bytes: 4076
num_examples: 51
download_size: 17361
dataset_size: 33967
- config_name: epo-lad_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 20217
num_examples: 263
- name: validation
num_bytes: 2146
num_examples: 30
download_size: 13699
dataset_size: 22363
- config_name: epo-lat
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 428425
num_examples: 2799
- name: validation
num_bytes: 954895
num_examples: 5952
download_size: 797601
dataset_size: 1383320
- config_name: epo-lfn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 85462
num_examples: 996
- name: validation
num_bytes: 25319
num_examples: 262
download_size: 51940
dataset_size: 110781
- config_name: epo-lfn_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 70595
num_examples: 868
- name: validation
num_bytes: 17383
num_examples: 197
download_size: 44060
dataset_size: 87978
- config_name: epo-lit
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1017798
num_examples: 11510
- name: validation
num_bytes: 897443
num_examples: 10428
download_size: 957411
dataset_size: 1915241
- config_name: epo-nds
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 214103
num_examples: 2535
- name: validation
num_bytes: 100624
num_examples: 1180
download_size: 170026
dataset_size: 314727
- config_name: epo-nld
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1178268
num_examples: 12807
- name: validation
num_bytes: 7452669
num_examples: 80836
download_size: 4054685
dataset_size: 8630937
- config_name: epo-nob
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 345244
num_examples: 3091
- name: validation
num_bytes: 110134
num_examples: 986
download_size: 253811
dataset_size: 455378
- config_name: epo-nor
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 350833
num_examples: 3157
- name: validation
num_bytes: 112233
num_examples: 1010
download_size: 260333
dataset_size: 463066
- config_name: epo-oci
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 61803
num_examples: 587
download_size: 39875
dataset_size: 61803
- config_name: epo-pol
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 220838
num_examples: 2512
- name: validation
num_bytes: 508444
num_examples: 5785
download_size: 404321
dataset_size: 729282
- config_name: epo-ron
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 325498
num_examples: 3819
- name: validation
num_bytes: 111685
num_examples: 1318
download_size: 227905
dataset_size: 437183
- config_name: epo-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1388231
num_examples: 11673
- name: validation
num_bytes: 5467312
num_examples: 45959
download_size: 3085306
dataset_size: 6855543
- config_name: epo-slv
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 26709
num_examples: 301
download_size: 20072
dataset_size: 26709
- config_name: epo-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1022772
num_examples: 10734
- name: validation
num_bytes: 6222407
num_examples: 65018
download_size: 3579864
dataset_size: 7245179
- config_name: epo-srp_Cyrl
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 224873
num_examples: 1794
- name: validation
num_bytes: 253691
num_examples: 1996
download_size: 243276
dataset_size: 478564
- config_name: epo-srp_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 24078
num_examples: 269
- name: validation
num_bytes: 24818
num_examples: 277
download_size: 29793
dataset_size: 48896
- config_name: epo-swe
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 140701
num_examples: 1756
- name: validation
num_bytes: 6011
num_examples: 67
download_size: 78411
dataset_size: 146712
- config_name: epo-tgl
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 115164
num_examples: 1108
download_size: 64569
dataset_size: 115164
- config_name: epo-tlh
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 150783
num_examples: 1929
- name: validation
num_bytes: 11493
num_examples: 163
download_size: 81951
dataset_size: 162276
- config_name: epo-toki
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 267658
num_examples: 2732
- name: validation
num_bytes: 126955
num_examples: 1294
download_size: 175555
dataset_size: 394613
- config_name: epo-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 419480
num_examples: 4999
- name: validation
num_bytes: 643726
num_examples: 7649
download_size: 537663
dataset_size: 1063206
- config_name: epo-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 231908
num_examples: 2499
- name: validation
num_bytes: 423910
num_examples: 4558
download_size: 308499
dataset_size: 655818
- config_name: epo-vie
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 194820
num_examples: 1789
download_size: 106504
dataset_size: 194820
- config_name: epo-vol
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 64771
num_examples: 818
- name: validation
num_bytes: 2690
num_examples: 36
download_size: 35300
dataset_size: 67461
- config_name: epo-yid
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 106126
num_examples: 992
- name: validation
num_bytes: 221020
num_examples: 2023
download_size: 120758
dataset_size: 327146
- config_name: epo-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 199388
num_examples: 2240
- name: validation
num_bytes: 90049
num_examples: 1013
download_size: 160108
dataset_size: 289437
- config_name: est-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 60148
num_examples: 612
download_size: 32065
dataset_size: 60148
- config_name: eus-jpn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 17497
num_examples: 236
download_size: 10476
dataset_size: 17497
- config_name: eus-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 41158
num_examples: 484
download_size: 23219
dataset_size: 41158
- config_name: eus-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 156461
num_examples: 1849
- name: validation
num_bytes: 85575
num_examples: 1002
download_size: 142363
dataset_size: 242036
- config_name: fas-fra
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 49856
num_examples: 375
download_size: 29339
dataset_size: 49856
- config_name: fin-fin
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 78494
num_examples: 999
- name: validation
num_bytes: 28304
num_examples: 348
download_size: 58170
dataset_size: 106798
- config_name: fin-fkv
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 36942
num_examples: 388
- name: validation
num_bytes: 6025
num_examples: 67
download_size: 26460
dataset_size: 42967
- config_name: fin-fra
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 154714
num_examples: 1919
- name: validation
num_bytes: 84473
num_examples: 1055
download_size: 118989
dataset_size: 239187
- config_name: fin-heb
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 22299
num_examples: 211
download_size: 14503
dataset_size: 22299
- config_name: fin-hun
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 96479
num_examples: 1296
download_size: 50162
dataset_size: 96479
- config_name: fin-ita
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 85183
num_examples: 1038
- name: validation
num_bytes: 81712
num_examples: 1002
download_size: 91368
dataset_size: 166895
- config_name: fin-jpn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1057712
num_examples: 9933
- name: validation
num_bytes: 1169077
num_examples: 10916
download_size: 981892
dataset_size: 2226789
- config_name: fin-jpn_Hani
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 86086
num_examples: 975
- name: validation
num_bytes: 103970
num_examples: 1107
download_size: 94922
dataset_size: 190056
- config_name: fin-jpn_Hira
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 946165
num_examples: 8714
- name: validation
num_bytes: 1026133
num_examples: 9437
download_size: 861481
dataset_size: 1972298
- config_name: fin-jpn_Kana
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 23503
num_examples: 225
- name: validation
num_bytes: 35948
num_examples: 341
download_size: 33085
dataset_size: 59451
- config_name: fin-kor
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 38867
num_examples: 421
download_size: 19418
dataset_size: 38867
- config_name: fin-kor_Hang
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 37733
num_examples: 410
download_size: 18965
dataset_size: 37733
- config_name: fin-kur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 23726
num_examples: 258
download_size: 16435
dataset_size: 23726
- config_name: fin-lat
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 22133
num_examples: 293
download_size: 13259
dataset_size: 22133
- config_name: fin-nld
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 25055
num_examples: 316
download_size: 16414
dataset_size: 25055
- config_name: fin-nno
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 60606
num_examples: 815
- name: validation
num_bytes: 30940
num_examples: 400
download_size: 41255
dataset_size: 91546
- config_name: fin-nob
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 129727
num_examples: 1671
- name: validation
num_bytes: 58412
num_examples: 746
download_size: 79422
dataset_size: 188139
- config_name: fin-nor
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 190395
num_examples: 2487
- name: validation
num_bytes: 89440
num_examples: 1147
download_size: 118809
dataset_size: 279835
- config_name: fin-pol
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 50430
num_examples: 608
download_size: 30459
dataset_size: 50430
- config_name: fin-por
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 37567
num_examples: 476
download_size: 23384
dataset_size: 37567
- config_name: fin-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 367235
num_examples: 3642
- name: validation
num_bytes: 102901
num_examples: 1008
download_size: 226112
dataset_size: 470136
- config_name: fin-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 216404
num_examples: 2512
- name: validation
num_bytes: 639283
num_examples: 7398
download_size: 451519
dataset_size: 855687
- config_name: fin-swe
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 242995
num_examples: 2840
- name: validation
num_bytes: 610373
num_examples: 7283
download_size: 434244
dataset_size: 853368
- config_name: fin-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 161803
num_examples: 1747
download_size: 100428
dataset_size: 161803
- config_name: fin-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 29525
num_examples: 381
download_size: 16206
dataset_size: 29525
- config_name: fra-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 714693
num_examples: 6824
- name: validation
num_bytes: 539737
num_examples: 5146
download_size: 669458
dataset_size: 1254430
- config_name: fra-cmn_Hant
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 200372
num_examples: 2131
- name: validation
num_bytes: 159759
num_examples: 1716
download_size: 195175
dataset_size: 360131
- config_name: fra-fra
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 95853
num_examples: 999
- name: validation
num_bytes: 200167
num_examples: 2117
download_size: 162861
dataset_size: 296020
- config_name: fra-gcf
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 83594
num_examples: 1163
download_size: 36863
dataset_size: 83594
- config_name: fra-hbs
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 43454
num_examples: 473
download_size: 25778
dataset_size: 43454
- config_name: fra-heb
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 351140
num_examples: 3280
- name: validation
num_bytes: 112878
num_examples: 1063
download_size: 224259
dataset_size: 464018
- config_name: fra-hrv
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 22475
num_examples: 257
download_size: 16028
dataset_size: 22475
- config_name: fra-hun
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 206436
num_examples: 2493
- name: validation
num_bytes: 403973
num_examples: 4847
download_size: 325750
dataset_size: 610409
- config_name: fra-ido
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 24221
num_examples: 295
- name: validation
num_bytes: 1966
num_examples: 25
download_size: 18277
dataset_size: 26187
- config_name: fra-ile
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 31510
num_examples: 393
download_size: 18670
dataset_size: 31510
- config_name: fra-ina
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 119331
num_examples: 1176
- name: validation
num_bytes: 6770
num_examples: 84
download_size: 70202
dataset_size: 126101
- config_name: fra-ind
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 85936
num_examples: 904
download_size: 42819
dataset_size: 85936
- config_name: fra-ita
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 839485
num_examples: 10090
- name: validation
num_bytes: 5720450
num_examples: 68867
download_size: 2551887
dataset_size: 6559935
- config_name: fra-jbo
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 107711
num_examples: 1113
download_size: 54579
dataset_size: 107711
- config_name: fra-jpn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1224719
num_examples: 10168
- name: validation
num_bytes: 3438636
num_examples: 28593
download_size: 2222126
dataset_size: 4663355
- config_name: fra-jpn_Hani
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 130113
num_examples: 1178
- name: validation
num_bytes: 364029
num_examples: 3263
download_size: 258314
dataset_size: 494142
- config_name: fra-jpn_Hira
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1070788
num_examples: 8790
- name: validation
num_bytes: 3017108
num_examples: 24841
download_size: 1926735
dataset_size: 4087896
- config_name: fra-kab
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1199219
num_examples: 12345
- name: validation
num_bytes: 1731934
num_examples: 18362
download_size: 1573483
dataset_size: 2931153
- config_name: fra-kor
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 35629
num_examples: 320
download_size: 21764
dataset_size: 35629
- config_name: fra-kor_Hang
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 35253
num_examples: 315
download_size: 21576
dataset_size: 35253
- config_name: fra-lat
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 398183
num_examples: 2914
- name: validation
num_bytes: 150151
num_examples: 1132
download_size: 325398
dataset_size: 548334
- config_name: fra-lfn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 61775
num_examples: 522
- name: validation
num_bytes: 17732
num_examples: 120
download_size: 42790
dataset_size: 79507
- config_name: fra-lfn_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 36842
num_examples: 360
- name: validation
num_bytes: 7190
num_examples: 59
download_size: 29938
dataset_size: 44032
- config_name: fra-msa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 97273
num_examples: 1002
download_size: 49651
dataset_size: 97273
- config_name: fra-nds
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 69733
num_examples: 856
download_size: 37545
dataset_size: 69733
- config_name: fra-nld
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1032379
num_examples: 11547
- name: validation
num_bytes: 1434058
num_examples: 16734
download_size: 1225080
dataset_size: 2466437
- config_name: fra-nob
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 28098
num_examples: 322
download_size: 18839
dataset_size: 28098
- config_name: fra-nor
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 39314
num_examples: 476
download_size: 23735
dataset_size: 39314
- config_name: fra-oci
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 77697
num_examples: 805
download_size: 47306
dataset_size: 77697
- config_name: fra-pcd
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 21120
num_examples: 265
download_size: 14284
dataset_size: 21120
- config_name: fra-pol
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 292987
num_examples: 3086
- name: validation
num_bytes: 97700
num_examples: 1004
download_size: 227030
dataset_size: 390687
- config_name: fra-por
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 984313
num_examples: 10517
- name: validation
num_bytes: 1557910
num_examples: 17061
download_size: 1297547
dataset_size: 2542223
- config_name: fra-ron
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 163669
num_examples: 1924
download_size: 86874
dataset_size: 163669
- config_name: fra-run
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 87929
num_examples: 1273
download_size: 39989
dataset_size: 87929
- config_name: fra-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1211147
num_examples: 11489
- name: validation
num_bytes: 19406105
num_examples: 183049
download_size: 7524789
dataset_size: 20617252
- config_name: fra-slv
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 43487
num_examples: 447
download_size: 30108
dataset_size: 43487
- config_name: fra-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1006273
num_examples: 10282
- name: validation
num_bytes: 3969459
num_examples: 40594
download_size: 2523610
dataset_size: 4975732
- config_name: fra-swe
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 118006
num_examples: 1406
- name: validation
num_bytes: 93025
num_examples: 1126
download_size: 116631
dataset_size: 211031
- config_name: fra-tat
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 29358
num_examples: 304
download_size: 17031
dataset_size: 29358
- config_name: fra-tgl
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 79571
num_examples: 835
download_size: 42990
dataset_size: 79571
- config_name: fra-tlh
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 48938
num_examples: 619
download_size: 26025
dataset_size: 48938
- config_name: fra-tlh_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 48870
num_examples: 618
download_size: 26013
dataset_size: 48870
- config_name: fra-toki
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 54839
num_examples: 561
- name: validation
num_bytes: 3517
num_examples: 25
download_size: 31300
dataset_size: 58356
- config_name: fra-toki_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 54741
num_examples: 560
download_size: 26716
dataset_size: 54741
- config_name: fra-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 239792
num_examples: 2581
- name: validation
num_bytes: 623793
num_examples: 6717
download_size: 470964
dataset_size: 863585
- config_name: fra-uig
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 75067
num_examples: 693
download_size: 35562
dataset_size: 75067
- config_name: fra-uig_Arab
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 74877
num_examples: 692
download_size: 35414
dataset_size: 74877
- config_name: fra-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 956739
num_examples: 10034
- name: validation
num_bytes: 1740948
num_examples: 18251
download_size: 1168887
dataset_size: 2697687
- config_name: fra-vie
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 148176
num_examples: 1026
download_size: 84386
dataset_size: 148176
- config_name: fra-wuu
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 74665
num_examples: 744
- name: validation
num_bytes: 47382
num_examples: 493
download_size: 74331
dataset_size: 122047
- config_name: fra-yid
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 40020
num_examples: 383
- name: validation
num_bytes: 6722
num_examples: 60
download_size: 24894
dataset_size: 46742
- config_name: fra-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1016230
num_examples: 9993
- name: validation
num_bytes: 765427
num_examples: 7557
download_size: 954496
dataset_size: 1781657
- config_name: fry-nld
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 21584
num_examples: 259
download_size: 15175
dataset_size: 21584
- config_name: gcf-gcf
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 17072
num_examples: 232
download_size: 8382
dataset_size: 17072
- config_name: gla-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 22545
num_examples: 288
download_size: 13378
dataset_size: 22545
- config_name: glg-por
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 38555
num_examples: 432
download_size: 25288
dataset_size: 38555
- config_name: glg-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 209239
num_examples: 2120
- name: validation
num_bytes: 98399
num_examples: 1011
download_size: 188047
dataset_size: 307638
- config_name: gos-nld
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 122569
num_examples: 1851
- name: validation
num_bytes: 31135
num_examples: 426
download_size: 80741
dataset_size: 153704
- config_name: grn-por
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 20197
num_examples: 223
- name: validation
num_bytes: 6613
num_examples: 73
download_size: 20489
dataset_size: 26810
- config_name: grn-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 53426
num_examples: 658
- name: validation
num_bytes: 12618
num_examples: 129
download_size: 41100
dataset_size: 66044
- config_name: hbs-ita
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 43433
num_examples: 533
download_size: 24080
dataset_size: 43433
- config_name: hbs-jpn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 40363
num_examples: 399
download_size: 23550
dataset_size: 40363
- config_name: hbs-nor
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 541133
num_examples: 4999
- name: validation
num_bytes: 648554
num_examples: 6148
download_size: 656217
dataset_size: 1189687
- config_name: hbs-pol
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 38040
num_examples: 416
download_size: 24618
dataset_size: 38040
- config_name: hbs-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 266963
num_examples: 2499
- name: validation
num_bytes: 447457
num_examples: 4176
download_size: 340887
dataset_size: 714420
- config_name: hbs-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 50110
num_examples: 606
download_size: 27900
dataset_size: 50110
- config_name: hbs-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 87516
num_examples: 941
download_size: 43358
dataset_size: 87516
- config_name: hbs-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 20146
num_examples: 235
download_size: 11983
dataset_size: 20146
- config_name: heb-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 26063
num_examples: 284
download_size: 14508
dataset_size: 26063
- config_name: heb-cmn_Hant
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 27168
num_examples: 328
download_size: 13810
dataset_size: 27168
- config_name: heb-heb
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 108260
num_examples: 999
- name: validation
num_bytes: 80048
num_examples: 731
download_size: 92997
dataset_size: 188308
- config_name: heb-hun
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 37276
num_examples: 416
download_size: 21085
dataset_size: 37276
- config_name: heb-ina
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 21868
num_examples: 216
- name: validation
num_bytes: 3739
num_examples: 40
download_size: 14968
dataset_size: 25607
- config_name: heb-ita
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 163344
num_examples: 1705
- name: validation
num_bytes: 2500
num_examples: 29
download_size: 79221
dataset_size: 165844
- config_name: heb-jpn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 30285
num_examples: 240
download_size: 17375
dataset_size: 30285
- config_name: heb-jpn_Hira
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 25879
num_examples: 200
download_size: 15647
dataset_size: 25879
- config_name: heb-lad
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 21415
num_examples: 217
- name: validation
num_bytes: 3554
num_examples: 40
download_size: 12598
dataset_size: 24969
- config_name: heb-lat
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 26822
num_examples: 270
- name: validation
num_bytes: 5540
num_examples: 57
download_size: 19894
dataset_size: 32362
- config_name: heb-lfn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 27303
num_examples: 279
- name: validation
num_bytes: 7178
num_examples: 71
download_size: 18710
dataset_size: 34481
- config_name: heb-lfn_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 19775
num_examples: 211
- name: validation
num_bytes: 3816
num_examples: 40
download_size: 15274
dataset_size: 23591
- config_name: heb-nld
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 266639
num_examples: 2499
- name: validation
num_bytes: 468326
num_examples: 4356
download_size: 386648
dataset_size: 734965
- config_name: heb-pol
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 497202
num_examples: 4999
- name: validation
num_bytes: 774983
num_examples: 7756
download_size: 643636
dataset_size: 1272185
- config_name: heb-por
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 72677
num_examples: 718
- name: validation
num_bytes: 1747
num_examples: 23
download_size: 37307
dataset_size: 74424
- config_name: heb-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 302724
num_examples: 2499
- name: validation
num_bytes: 428629
num_examples: 3567
download_size: 338004
dataset_size: 731353
- config_name: heb-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 196688
num_examples: 1848
- name: validation
num_bytes: 111724
num_examples: 1076
download_size: 156609
dataset_size: 308412
- config_name: heb-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 121159
num_examples: 1376
- name: validation
num_bytes: 2254
num_examples: 29
download_size: 53530
dataset_size: 123413
- config_name: heb-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 94997
num_examples: 965
download_size: 42585
dataset_size: 94997
- config_name: heb-yid
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 109288
num_examples: 900
- name: validation
num_bytes: 23281
num_examples: 195
download_size: 55057
dataset_size: 132569
- config_name: heb-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 60713
num_examples: 708
download_size: 28884
dataset_size: 60713
- config_name: hin-urd
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 36470
num_examples: 239
download_size: 18339
dataset_size: 36470
- config_name: hin-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 48351
num_examples: 323
download_size: 23182
dataset_size: 48351
- config_name: hrv-jpn_Hira
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 28149
num_examples: 275
download_size: 17536
dataset_size: 28149
- config_name: hrv-pol
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 22635
num_examples: 270
download_size: 16347
dataset_size: 22635
- config_name: hrv-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 19896
num_examples: 253
download_size: 14172
dataset_size: 19896
- config_name: hrv-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 36838
num_examples: 388
download_size: 22903
dataset_size: 36838
- config_name: hsb-slv
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 44372
num_examples: 742
download_size: 20741
dataset_size: 44372
- config_name: hun-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 22749
num_examples: 246
download_size: 15718
dataset_size: 22749
- config_name: hun-hun
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 97219
num_examples: 1063
download_size: 60723
dataset_size: 97219
- config_name: hun-ita
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 389121
num_examples: 4999
- name: validation
num_bytes: 554593
num_examples: 7158
download_size: 455662
dataset_size: 943714
- config_name: hun-jpn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 311959
num_examples: 2498
- name: validation
num_bytes: 453496
num_examples: 3659
download_size: 429779
dataset_size: 765455
- config_name: hun-jpn_Hani
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 41402
num_examples: 350
- name: validation
num_bytes: 68853
num_examples: 578
download_size: 69906
dataset_size: 110255
- config_name: hun-jpn_Hira
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 267931
num_examples: 2126
- name: validation
num_bytes: 379193
num_examples: 3035
download_size: 361985
dataset_size: 647124
- config_name: hun-kor
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 25888
num_examples: 270
download_size: 15849
dataset_size: 25888
- config_name: hun-kor_Hang
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 25608
num_examples: 267
download_size: 15747
dataset_size: 25608
- config_name: hun-lat
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 50223
num_examples: 606
download_size: 28512
dataset_size: 50223
- config_name: hun-nld
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 133308
num_examples: 1628
- name: validation
num_bytes: 2404
num_examples: 34
download_size: 76656
dataset_size: 135712
- config_name: hun-pol
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 163206
num_examples: 1933
download_size: 93818
dataset_size: 163206
- config_name: hun-por
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 205901
num_examples: 2499
- name: validation
num_bytes: 290673
num_examples: 3516
download_size: 280795
dataset_size: 496574
- config_name: hun-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 284982
num_examples: 2686
- name: validation
num_bytes: 638282
num_examples: 6097
download_size: 459837
dataset_size: 923264
- config_name: hun-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 209578
num_examples: 2499
- name: validation
num_bytes: 353678
num_examples: 4193
download_size: 318984
dataset_size: 563256
- config_name: hun-swe
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 137226
num_examples: 1613
- name: validation
num_bytes: 42072
num_examples: 524
download_size: 100142
dataset_size: 179298
- config_name: hun-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 75382
num_examples: 1001
download_size: 43401
dataset_size: 75382
- config_name: hun-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 43349
num_examples: 472
download_size: 24776
dataset_size: 43349
- config_name: hun-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 38349
num_examples: 433
download_size: 24587
dataset_size: 38349
- config_name: hye-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 32430
num_examples: 226
download_size: 19500
dataset_size: 32430
- config_name: ido-ina
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 24488
num_examples: 315
- name: validation
num_bytes: 4132
num_examples: 51
download_size: 15761
dataset_size: 28620
- config_name: ido-ita
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 131629
num_examples: 1459
- name: validation
num_bytes: 2824
num_examples: 38
download_size: 75968
dataset_size: 134453
- config_name: ido-lfn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 27869
num_examples: 347
- name: validation
num_bytes: 4414
num_examples: 54
download_size: 17583
dataset_size: 32283
- config_name: ido-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 42932
num_examples: 498
download_size: 24884
dataset_size: 42932
- config_name: ido-yid
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 42711
num_examples: 425
- name: validation
num_bytes: 6060
num_examples: 57
download_size: 22532
dataset_size: 48771
- config_name: ido_Latn-lfn_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 23808
num_examples: 308
- name: validation
num_bytes: 2844
num_examples: 37
download_size: 15374
dataset_size: 26652
- config_name: ina-ita
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 90877
num_examples: 911
download_size: 47258
dataset_size: 90877
- config_name: ina-lad
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 27197
num_examples: 319
- name: validation
num_bytes: 3814
num_examples: 45
download_size: 15003
dataset_size: 31011
- config_name: ina-lat
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 83249
num_examples: 911
- name: validation
num_bytes: 5396
num_examples: 69
download_size: 46162
dataset_size: 88645
- config_name: ina-lfn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 65050
num_examples: 768
- name: validation
num_bytes: 10419
num_examples: 121
download_size: 32655
dataset_size: 75469
- config_name: ina-nld
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 103264
num_examples: 1032
download_size: 57615
dataset_size: 103264
- config_name: ina-por
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 305959
num_examples: 2536
- name: validation
num_bytes: 868638
num_examples: 7079
download_size: 630752
dataset_size: 1174597
- config_name: ina-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 193271
num_examples: 1452
- name: validation
num_bytes: 3159
num_examples: 38
download_size: 104351
dataset_size: 196430
- config_name: ina-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 184789
num_examples: 1612
- name: validation
num_bytes: 9064
num_examples: 108
download_size: 112003
dataset_size: 193853
- config_name: ina-tlh
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 14888
num_examples: 215
- name: validation
num_bytes: 2395
num_examples: 33
download_size: 10654
dataset_size: 17283
- config_name: ina-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 24080
num_examples: 316
download_size: 12943
dataset_size: 24080
- config_name: ina-yid
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 82845
num_examples: 787
- name: validation
num_bytes: 13703
num_examples: 119
download_size: 37542
dataset_size: 96548
- config_name: ina_Latn-lad_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 18914
num_examples: 231
- name: validation
num_bytes: 2130
num_examples: 28
download_size: 12137
dataset_size: 21044
- config_name: ina_Latn-lfn_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 56715
num_examples: 691
- name: validation
num_bytes: 7857
num_examples: 95
download_size: 29436
dataset_size: 64572
- config_name: ina_Latn-tlh_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 14835
num_examples: 214
download_size: 7397
dataset_size: 14835
- config_name: ind-zsm_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 25864
num_examples: 224
download_size: 17176
dataset_size: 25864
- config_name: isl-ita
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 18908
num_examples: 235
download_size: 13096
dataset_size: 18908
- config_name: isl-jpn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 24848
num_examples: 250
download_size: 14856
dataset_size: 24848
- config_name: isl-jpn_Hira
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 20683
num_examples: 210
download_size: 12763
dataset_size: 20683
- config_name: isl-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 16580
num_examples: 237
download_size: 11275
dataset_size: 16580
- config_name: ita-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 160755
num_examples: 1826
- name: validation
num_bytes: 51932
num_examples: 580
download_size: 109218
dataset_size: 212687
- config_name: ita-cmn_Hant
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 84647
num_examples: 1036
- name: validation
num_bytes: 33399
num_examples: 397
download_size: 62080
dataset_size: 118046
- config_name: ita-ind
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 31398
num_examples: 367
download_size: 17766
dataset_size: 31398
- config_name: ita-ita
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 91878
num_examples: 999
- name: validation
num_bytes: 31945
num_examples: 349
download_size: 68215
dataset_size: 123823
- config_name: ita-jpn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 305457
num_examples: 2659
- name: validation
num_bytes: 118091
num_examples: 1004
download_size: 225897
dataset_size: 423548
- config_name: ita-jpn_Hani
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 30518
num_examples: 287
- name: validation
num_bytes: 11399
num_examples: 104
download_size: 29023
dataset_size: 41917
- config_name: ita-jpn_Hira
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 269258
num_examples: 2325
- name: validation
num_bytes: 103941
num_examples: 879
download_size: 197961
dataset_size: 373199
- config_name: ita-lat
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 153385
num_examples: 1715
- name: validation
num_bytes: 1501
num_examples: 20
download_size: 85739
dataset_size: 154886
- config_name: ita-lit
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 19131
num_examples: 223
download_size: 14660
dataset_size: 19131
- config_name: ita-msa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 37328
num_examples: 425
download_size: 21126
dataset_size: 37328
- config_name: ita-nds
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 25021
num_examples: 312
download_size: 16148
dataset_size: 25021
- config_name: ita-nld
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 203972
num_examples: 2577
- name: validation
num_bytes: 552453
num_examples: 6939
download_size: 380577
dataset_size: 756425
- config_name: ita-nor
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 113673
num_examples: 938
download_size: 64121
dataset_size: 113673
- config_name: ita-pms
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 21573
num_examples: 231
download_size: 15301
dataset_size: 21573
- config_name: ita-pol
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 109444
num_examples: 1293
- name: validation
num_bytes: 85500
num_examples: 1001
download_size: 118611
dataset_size: 194944
- config_name: ita-por
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 310707
num_examples: 3065
- name: validation
num_bytes: 590134
num_examples: 6370
download_size: 485946
dataset_size: 900841
- config_name: ita-ron
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 80875
num_examples: 1004
download_size: 43545
dataset_size: 80875
- config_name: ita-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1114704
num_examples: 10044
- name: validation
num_bytes: 7351576
num_examples: 66132
download_size: 3655171
dataset_size: 8466280
- config_name: ita-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 441378
num_examples: 4980
- name: validation
num_bytes: 816444
num_examples: 9246
download_size: 682089
dataset_size: 1257822
- config_name: ita-swe
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 59844
num_examples: 714
download_size: 34751
dataset_size: 59844
- config_name: ita-toki
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 19371
num_examples: 199
download_size: 11797
dataset_size: 19371
- config_name: ita-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 998370
num_examples: 9999
- name: validation
num_bytes: 564058
num_examples: 5702
download_size: 466198
dataset_size: 1562428
- config_name: ita-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 433759
num_examples: 4999
- name: validation
num_bytes: 760632
num_examples: 8774
download_size: 509763
dataset_size: 1194391
- config_name: ita-vie
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 24470
num_examples: 249
download_size: 16510
dataset_size: 24470
- config_name: ita-yid
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 19240
num_examples: 205
- name: validation
num_bytes: 3948
num_examples: 36
download_size: 14049
dataset_size: 23188
- config_name: ita-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 252327
num_examples: 2942
- name: validation
num_bytes: 87841
num_examples: 1003
download_size: 171973
dataset_size: 340168
- config_name: jbo-jpn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 92236
num_examples: 920
download_size: 42342
dataset_size: 92236
- config_name: jbo-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 123960
num_examples: 1198
download_size: 60175
dataset_size: 123960
- config_name: jbo-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 125512
num_examples: 1506
download_size: 61472
dataset_size: 125512
- config_name: jbo-swe
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 17865
num_examples: 241
download_size: 11688
dataset_size: 17865
- config_name: jbo-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 42406
num_examples: 517
- name: validation
num_bytes: 1870
num_examples: 22
download_size: 23837
dataset_size: 44276
- config_name: jbo_Latn-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 23379
num_examples: 280
download_size: 13137
dataset_size: 23379
- config_name: jbo_Latn-cmn_Hant
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 18379
num_examples: 230
- name: validation
num_bytes: 1702
num_examples: 20
download_size: 13358
dataset_size: 20081
- config_name: jbo_Latn-jpn_Hira
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 82695
num_examples: 825
download_size: 38109
dataset_size: 82695
- config_name: jpn-jpn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 81644
num_examples: 583
download_size: 42091
dataset_size: 81644
- config_name: jpn-kor
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 71655
num_examples: 621
download_size: 37810
dataset_size: 71655
- config_name: jpn-lit
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 24465
num_examples: 245
download_size: 15257
dataset_size: 24465
- config_name: jpn-mar
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 47030
num_examples: 339
download_size: 19843
dataset_size: 47030
- config_name: jpn-msa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 274231
num_examples: 2615
- name: validation
num_bytes: 110792
num_examples: 1059
download_size: 192103
dataset_size: 385023
- config_name: jpn-nds
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 39337
num_examples: 405
- name: validation
num_bytes: 5535
num_examples: 52
download_size: 25417
dataset_size: 44872
- config_name: jpn-nld
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 352873
num_examples: 3429
- name: validation
num_bytes: 108309
num_examples: 1056
download_size: 229719
dataset_size: 461182
- config_name: jpn-nor
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 117048
num_examples: 1046
download_size: 63350
dataset_size: 117048
- config_name: jpn-pol
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1164653
num_examples: 9998
- name: validation
num_bytes: 1730980
num_examples: 14808
download_size: 1512836
dataset_size: 2895633
- config_name: jpn-por
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 216426
num_examples: 1939
- name: validation
num_bytes: 127225
num_examples: 1132
download_size: 185258
dataset_size: 343651
- config_name: jpn-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1380936
num_examples: 10141
- name: validation
num_bytes: 2134502
num_examples: 15674
download_size: 1640265
dataset_size: 3515438
- config_name: jpn-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1194028
num_examples: 10741
- name: validation
num_bytes: 2614778
num_examples: 23419
download_size: 1846070
dataset_size: 3808806
- config_name: jpn-swe
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 93655
num_examples: 895
download_size: 47852
dataset_size: 93655
- config_name: jpn-tlh
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 73820
num_examples: 673
download_size: 36035
dataset_size: 73820
- config_name: jpn-toki
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 23708
num_examples: 241
download_size: 11441
dataset_size: 23708
- config_name: jpn-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 88452
num_examples: 810
download_size: 48739
dataset_size: 88452
- config_name: jpn-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 54028
num_examples: 456
- name: validation
num_bytes: 2151
num_examples: 20
download_size: 31854
dataset_size: 56179
- config_name: jpn-vie
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 135321
num_examples: 1054
- name: validation
num_bytes: 76557
num_examples: 584
download_size: 103531
dataset_size: 211878
- config_name: jpn-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 274897
num_examples: 2496
- name: validation
num_bytes: 318480
num_examples: 2893
download_size: 302048
dataset_size: 593377
- config_name: jpn_Hani-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 21359
num_examples: 215
- name: validation
num_bytes: 22317
num_examples: 228
download_size: 27964
dataset_size: 43676
- config_name: jpn_Hani-nld
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 40240
num_examples: 425
- name: validation
num_bytes: 10865
num_examples: 118
download_size: 31212
dataset_size: 51105
- config_name: jpn_Hani-pol
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 117055
num_examples: 1080
- name: validation
num_bytes: 169999
num_examples: 1570
download_size: 162734
dataset_size: 287054
- config_name: jpn_Hani-por
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 34559
num_examples: 335
- name: validation
num_bytes: 16346
num_examples: 159
download_size: 34122
dataset_size: 50905
- config_name: jpn_Hani-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 110549
num_examples: 823
- name: validation
num_bytes: 178500
num_examples: 1325
download_size: 147019
dataset_size: 289049
- config_name: jpn_Hani-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 115105
num_examples: 1098
- name: validation
num_bytes: 255898
num_examples: 2464
download_size: 192880
dataset_size: 371003
- config_name: jpn_Hira-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 139892
num_examples: 1192
- name: validation
num_bytes: 165983
num_examples: 1439
download_size: 159085
dataset_size: 305875
- config_name: jpn_Hira-cmn_Hant
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 73955
num_examples: 681
- name: validation
num_bytes: 84628
num_examples: 778
download_size: 81587
dataset_size: 158583
- config_name: jpn_Hira-ind
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 236214
num_examples: 2255
- name: validation
num_bytes: 91241
num_examples: 873
download_size: 161763
dataset_size: 327455
- config_name: jpn_Hira-jpn_Hira
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 65397
num_examples: 440
download_size: 34178
dataset_size: 65397
- config_name: jpn_Hira-kor_Hang
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 59682
num_examples: 511
download_size: 32220
dataset_size: 59682
- config_name: jpn_Hira-lit
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 20604
num_examples: 204
download_size: 13134
dataset_size: 20604
- config_name: jpn_Hira-mar
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 34541
num_examples: 253
download_size: 14813
dataset_size: 34541
- config_name: jpn_Hira-nds
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 33481
num_examples: 346
- name: validation
num_bytes: 5212
num_examples: 49
download_size: 22969
dataset_size: 38693
- config_name: jpn_Hira-nld
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 304533
num_examples: 2924
- name: validation
num_bytes: 93200
num_examples: 894
download_size: 195182
dataset_size: 397733
- config_name: jpn_Hira-nob
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 93077
num_examples: 804
download_size: 49978
dataset_size: 93077
- config_name: jpn_Hira-pol
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1029980
num_examples: 8764
- name: validation
num_bytes: 1534624
num_examples: 13009
download_size: 1330532
dataset_size: 2564604
- config_name: jpn_Hira-por
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 175561
num_examples: 1539
- name: validation
num_bytes: 103935
num_examples: 907
download_size: 149727
dataset_size: 279496
- config_name: jpn_Hira-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1226928
num_examples: 9007
- name: validation
num_bytes: 1881431
num_examples: 13786
download_size: 1439873
dataset_size: 3108359
- config_name: jpn_Hira-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1050339
num_examples: 9378
- name: validation
num_bytes: 2307088
num_examples: 20504
download_size: 1615553
dataset_size: 3357427
- config_name: jpn_Hira-swe
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 79203
num_examples: 728
download_size: 40909
dataset_size: 79203
- config_name: jpn_Hira-tlh_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 70052
num_examples: 637
download_size: 34176
dataset_size: 70052
- config_name: jpn_Hira-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 80294
num_examples: 731
download_size: 44455
dataset_size: 80294
- config_name: jpn_Hira-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 46097
num_examples: 382
download_size: 24917
dataset_size: 46097
- config_name: jpn_Hira-vie
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 117705
num_examples: 885
- name: validation
num_bytes: 65151
num_examples: 502
download_size: 88516
dataset_size: 182856
- config_name: jpn_Kana-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 41427
num_examples: 299
- name: validation
num_bytes: 71221
num_examples: 536
download_size: 60105
dataset_size: 112648
- config_name: jpn_Kana-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 27485
num_examples: 253
- name: validation
num_bytes: 50396
num_examples: 438
download_size: 45648
dataset_size: 77881
- config_name: kab-kab
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 104613
num_examples: 992
- name: validation
num_bytes: 28067
num_examples: 308
download_size: 55056
dataset_size: 132680
- config_name: kab-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 44470
num_examples: 422
download_size: 21560
dataset_size: 44470
- config_name: kab-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 81054
num_examples: 882
download_size: 45672
dataset_size: 81054
- config_name: kat-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 99104
num_examples: 641
download_size: 40773
dataset_size: 99104
- config_name: kaz-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 354063
num_examples: 1399
- name: validation
num_bytes: 253814
num_examples: 1015
download_size: 307583
dataset_size: 607877
- config_name: kaz_Cyrl-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 353875
num_examples: 1397
download_size: 178055
dataset_size: 353875
- config_name: khm-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 164312
num_examples: 1447
download_size: 67906
dataset_size: 164312
- config_name: kor-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 25610
num_examples: 220
download_size: 15434
dataset_size: 25610
- config_name: kor-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 94260
num_examples: 939
download_size: 51021
dataset_size: 94260
- config_name: kor-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 41990
num_examples: 410
download_size: 24333
dataset_size: 41990
- config_name: kor_Hang-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 21351
num_examples: 206
download_size: 14006
dataset_size: 21351
- config_name: kor_Hang-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 25515
num_examples: 219
download_size: 15362
dataset_size: 25515
- config_name: kor_Hang-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 92835
num_examples: 920
download_size: 50406
dataset_size: 92835
- config_name: kzj-msa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 50098
num_examples: 369
download_size: 28983
dataset_size: 50098
- config_name: kzj_Latn-zsm_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 47082
num_examples: 346
download_size: 27670
dataset_size: 47082
- config_name: lad-lat
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 17154
num_examples: 213
- name: validation
num_bytes: 2587
num_examples: 32
download_size: 11583
dataset_size: 19741
- config_name: lad-lfn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 27893
num_examples: 334
- name: validation
num_bytes: 5940
num_examples: 67
download_size: 15923
dataset_size: 33833
- config_name: lad-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 20488
num_examples: 275
- name: validation
num_bytes: 2172
num_examples: 24
download_size: 14475
dataset_size: 22660
- config_name: lad-yid
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 62268
num_examples: 603
- name: validation
num_bytes: 9505
num_examples: 83
download_size: 25397
dataset_size: 71773
- config_name: lad_Latn-lfn_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 15873
num_examples: 210
- name: validation
num_bytes: 2184
num_examples: 28
download_size: 11464
dataset_size: 18057
- config_name: lad_Latn-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 17010
num_examples: 238
download_size: 9753
dataset_size: 17010
- config_name: lad_Latn-yid
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 43136
num_examples: 437
- name: validation
num_bytes: 4887
num_examples: 48
download_size: 20509
dataset_size: 48023
- config_name: lat-lat
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 22899
num_examples: 234
download_size: 14686
dataset_size: 22899
- config_name: lat-lfn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 26928
num_examples: 347
- name: validation
num_bytes: 5876
num_examples: 70
download_size: 17350
dataset_size: 32804
- config_name: lat-nld
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 29211
num_examples: 365
download_size: 17700
dataset_size: 29211
- config_name: lat-nor
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 27542
num_examples: 332
download_size: 17575
dataset_size: 27542
- config_name: lat-pol
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 71935
num_examples: 914
download_size: 37216
dataset_size: 71935
- config_name: lat-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 107946
num_examples: 1040
- name: validation
num_bytes: 108118
num_examples: 1034
download_size: 109064
dataset_size: 216064
- config_name: lat-tlh
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 16836
num_examples: 232
- name: validation
num_bytes: 2051
num_examples: 29
download_size: 12516
dataset_size: 18887
- config_name: lat-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 34475
num_examples: 381
download_size: 19303
dataset_size: 34475
- config_name: lat-yid
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 46691
num_examples: 457
- name: validation
num_bytes: 7502
num_examples: 77
download_size: 23684
dataset_size: 54193
- config_name: lat_Latn-lfn_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 22616
num_examples: 298
- name: validation
num_bytes: 3792
num_examples: 49
download_size: 15381
dataset_size: 26408
- config_name: lav-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 25437
num_examples: 273
download_size: 16042
dataset_size: 25437
- config_name: lfn-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 28109
num_examples: 224
- name: validation
num_bytes: 3674
num_examples: 25
download_size: 18848
dataset_size: 31783
- config_name: lfn-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 24218
num_examples: 263
- name: validation
num_bytes: 5599
num_examples: 45
download_size: 19620
dataset_size: 29817
- config_name: lfn-yid
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 105328
num_examples: 992
- name: validation
num_bytes: 27764
num_examples: 243
download_size: 51210
dataset_size: 133092
- config_name: lfn_Cyrl-por
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 30664
num_examples: 219
- name: validation
num_bytes: 71458
num_examples: 445
download_size: 54913
dataset_size: 102122
- config_name: lfn_Latn-yid
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 92455
num_examples: 897
- name: validation
num_bytes: 20263
num_examples: 189
download_size: 45418
dataset_size: 112718
- config_name: lit-pol
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 147577
num_examples: 1786
download_size: 83686
dataset_size: 147577
- config_name: lit-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 371707
num_examples: 3597
- name: validation
num_bytes: 544976
num_examples: 5229
download_size: 455649
dataset_size: 916683
- config_name: lit-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 34497
num_examples: 453
download_size: 21210
dataset_size: 34497
- config_name: lit-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 119992
num_examples: 1471
download_size: 62438
dataset_size: 119992
- config_name: ltz-nld
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 20518
num_examples: 291
download_size: 12009
dataset_size: 20518
- config_name: mkd-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 17481
num_examples: 216
download_size: 11101
dataset_size: 17481
- config_name: msa-msa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 92783
num_examples: 957
download_size: 47311
dataset_size: 92783
- config_name: msa-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 19231
num_examples: 231
download_size: 13270
dataset_size: 19231
- config_name: msa-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 38156
num_examples: 371
download_size: 23906
dataset_size: 38156
- config_name: nds-nld
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 137259
num_examples: 1656
- name: validation
num_bytes: 82442
num_examples: 1012
download_size: 123250
dataset_size: 219701
- config_name: nds-por
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 15384
num_examples: 206
download_size: 10929
dataset_size: 15384
- config_name: nds-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 89629
num_examples: 924
- name: validation
num_bytes: 3700
num_examples: 35
download_size: 45741
dataset_size: 93329
- config_name: nds-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 70857
num_examples: 922
download_size: 38462
dataset_size: 70857
- config_name: nld-cmn_Hant
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 148501
num_examples: 1512
download_size: 84767
dataset_size: 148501
- config_name: nld-nld
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 86698
num_examples: 999
- name: validation
num_bytes: 86091
num_examples: 1030
download_size: 99089
dataset_size: 172789
- config_name: nld-nor
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 16856
num_examples: 202
download_size: 12796
dataset_size: 16856
- config_name: nld-pol
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 101951
num_examples: 1192
- name: validation
num_bytes: 85008
num_examples: 1005
download_size: 112425
dataset_size: 186959
- config_name: nld-por
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 215122
num_examples: 2499
- name: validation
num_bytes: 423649
num_examples: 4881
download_size: 344564
dataset_size: 638771
- config_name: nld-ron
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 198420
num_examples: 2268
- name: validation
num_bytes: 89982
num_examples: 1047
download_size: 164417
dataset_size: 288402
- config_name: nld-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 272894
num_examples: 2549
- name: validation
num_bytes: 691407
num_examples: 6525
download_size: 471874
dataset_size: 964301
- config_name: nld-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 936602
num_examples: 10112
- name: validation
num_bytes: 1664140
num_examples: 17830
download_size: 1349764
dataset_size: 2600742
- config_name: nld-toki
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 51558
num_examples: 667
download_size: 19156
dataset_size: 51558
- config_name: nld-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 214187
num_examples: 2499
- name: validation
num_bytes: 333002
num_examples: 3879
download_size: 298029
dataset_size: 547189
- config_name: nld-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 895653
num_examples: 9999
- name: validation
num_bytes: 453307
num_examples: 5064
download_size: 587337
dataset_size: 1348960
- config_name: nld-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 160442
num_examples: 1653
- name: validation
num_bytes: 4384
num_examples: 55
download_size: 96258
dataset_size: 164826
- config_name: nno-nob
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 37401
num_examples: 466
download_size: 23237
dataset_size: 37401
- config_name: nob-nno
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 37335
num_examples: 465
download_size: 23226
dataset_size: 37335
- config_name: nob-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 186077
num_examples: 1276
download_size: 95790
dataset_size: 186077
- config_name: nob-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 84792
num_examples: 884
download_size: 47864
dataset_size: 84792
- config_name: nob-swe
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 45252
num_examples: 562
download_size: 26905
dataset_size: 45252
- config_name: nor-nor
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 78814
num_examples: 981
- name: validation
num_bytes: 24311
num_examples: 271
download_size: 54419
dataset_size: 103125
- config_name: nor-pol
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 25261
num_examples: 280
download_size: 17457
dataset_size: 25261
- config_name: nor-por
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 49225
num_examples: 480
download_size: 30952
dataset_size: 49225
- config_name: nor-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 186318
num_examples: 1278
download_size: 95761
dataset_size: 186318
- config_name: nor-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 90519
num_examples: 959
download_size: 50523
dataset_size: 90519
- config_name: nor-swe
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 45468
num_examples: 565
download_size: 27042
dataset_size: 45468
- config_name: nor-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 67193
num_examples: 669
download_size: 35949
dataset_size: 67193
- config_name: nor-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 18545
num_examples: 200
download_size: 13226
dataset_size: 18545
- config_name: orv-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 107455
num_examples: 972
download_size: 46404
dataset_size: 107455
- config_name: ota-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 32973
num_examples: 306
- name: validation
num_bytes: 2466
num_examples: 23
download_size: 18923
dataset_size: 35439
- config_name: pol-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 34399
num_examples: 367
download_size: 22730
dataset_size: 34399
- config_name: pol-cmn_Hant
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 57432
num_examples: 595
download_size: 35868
dataset_size: 57432
- config_name: pol-por
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 62768
num_examples: 704
download_size: 39871
dataset_size: 62768
- config_name: pol-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 375905
num_examples: 3542
- name: validation
num_bytes: 108015
num_examples: 1014
download_size: 256024
dataset_size: 483920
- config_name: pol-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 225734
num_examples: 2543
- name: validation
num_bytes: 453838
num_examples: 4997
download_size: 387050
dataset_size: 679572
- config_name: pol-swe
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 116608
num_examples: 1391
download_size: 69004
dataset_size: 116608
- config_name: pol-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 76786
num_examples: 891
download_size: 42992
dataset_size: 76786
- config_name: pol-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 229734
num_examples: 2518
- name: validation
num_bytes: 632674
num_examples: 6897
download_size: 413476
dataset_size: 862408
- config_name: pol-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 94909
num_examples: 1003
download_size: 57640
dataset_size: 94909
- config_name: por-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 56469
num_examples: 637
download_size: 31818
dataset_size: 56469
- config_name: por-cmn_Hant
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 30628
num_examples: 385
download_size: 18114
dataset_size: 30628
- config_name: por-por
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 234434
num_examples: 2499
- name: validation
num_bytes: 432790
num_examples: 4667
download_size: 322867
dataset_size: 667224
- config_name: por-ron
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 57248
num_examples: 680
- name: validation
num_bytes: 2252
num_examples: 30
download_size: 37814
dataset_size: 59500
- config_name: por-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1131882
num_examples: 9999
- name: validation
num_bytes: 1169089
num_examples: 10049
download_size: 1139810
dataset_size: 2300971
- config_name: por-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1075488
num_examples: 10946
- name: validation
num_bytes: 5453497
num_examples: 56715
download_size: 3351027
dataset_size: 6528985
- config_name: por-swe
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 25296
num_examples: 319
download_size: 16774
dataset_size: 25296
- config_name: por-tgl
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 173148
num_examples: 1776
download_size: 91178
dataset_size: 173148
- config_name: por-toki
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 179820
num_examples: 1718
- name: validation
num_bytes: 64182
num_examples: 484
download_size: 113023
dataset_size: 244002
- config_name: por-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 146633
num_examples: 1793
download_size: 82852
dataset_size: 146633
- config_name: por-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 325345
num_examples: 3371
- name: validation
num_bytes: 100029
num_examples: 1024
download_size: 211428
dataset_size: 425374
- config_name: por-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 87856
num_examples: 1032
download_size: 48124
dataset_size: 87856
- config_name: ron-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 75763
num_examples: 781
download_size: 40782
dataset_size: 75763
- config_name: ron-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 157598
num_examples: 1710
download_size: 85888
dataset_size: 157598
- config_name: ron-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 233141
num_examples: 2459
- name: validation
num_bytes: 94986
num_examples: 1009
download_size: 186075
dataset_size: 328127
- config_name: run-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 101614
num_examples: 1250
download_size: 41298
dataset_size: 101614
- config_name: run-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 63752
num_examples: 962
download_size: 30106
dataset_size: 63752
- config_name: rus-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 129158
num_examples: 1085
- name: validation
num_bytes: 265973
num_examples: 2220
download_size: 202720
dataset_size: 395131
- config_name: rus-cmn_Hant
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 84281
num_examples: 798
- name: validation
num_bytes: 161779
num_examples: 1524
download_size: 125874
dataset_size: 246060
- config_name: rus-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 342636
num_examples: 2499
- name: validation
num_bytes: 205795
num_examples: 1405
download_size: 261052
dataset_size: 548431
- config_name: rus-sah
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 129791
num_examples: 993
download_size: 61526
dataset_size: 129791
- config_name: rus-slv
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 64708
num_examples: 656
- name: validation
num_bytes: 4506
num_examples: 44
download_size: 40906
dataset_size: 69214
- config_name: rus-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1176173
num_examples: 10505
- name: validation
num_bytes: 9791707
num_examples: 86868
download_size: 4824662
dataset_size: 10967880
- config_name: rus-swe
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 131736
num_examples: 1281
- name: validation
num_bytes: 5146
num_examples: 51
download_size: 70042
dataset_size: 136882
- config_name: rus-tat
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 474098
num_examples: 2137
- name: validation
num_bytes: 215338
num_examples: 1004
download_size: 369077
dataset_size: 689436
- config_name: rus-tlh
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 22094
num_examples: 253
download_size: 12421
dataset_size: 22094
- config_name: rus-toki
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 97330
num_examples: 899
- name: validation
num_bytes: 2837
num_examples: 19
download_size: 47548
dataset_size: 100167
- config_name: rus-toki_Latn
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 97279
num_examples: 898
download_size: 43424
dataset_size: 97279
- config_name: rus-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 576835
num_examples: 4983
- name: validation
num_bytes: 732490
num_examples: 6215
download_size: 645009
dataset_size: 1309325
- config_name: rus-uig
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 72472
num_examples: 534
download_size: 34288
dataset_size: 72472
- config_name: rus-uig_Arab
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 71656
num_examples: 528
download_size: 33915
dataset_size: 71656
- config_name: rus-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 1226931
num_examples: 9999
- name: validation
num_bytes: 920404
num_examples: 7426
download_size: 984731
dataset_size: 2147335
- config_name: rus-vie
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 38079
num_examples: 312
download_size: 22660
dataset_size: 38079
- config_name: rus-xal
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 25775
num_examples: 208
download_size: 15791
dataset_size: 25775
- config_name: rus-yue_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 26120
num_examples: 223
- name: validation
num_bytes: 35524
num_examples: 318
download_size: 35649
dataset_size: 61644
- config_name: rus-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 290001
num_examples: 2499
- name: validation
num_bytes: 562693
num_examples: 4843
download_size: 429211
dataset_size: 852694
- config_name: slv-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 62770
num_examples: 717
download_size: 37459
dataset_size: 62770
- config_name: slv-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 70850
num_examples: 914
- name: validation
num_bytes: 4354
num_examples: 48
download_size: 39942
dataset_size: 75204
- config_name: slv-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 70456
num_examples: 824
download_size: 41402
dataset_size: 70456
- config_name: spa-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 97463
num_examples: 1141
- name: validation
num_bytes: 247029
num_examples: 2870
download_size: 178075
dataset_size: 344492
- config_name: spa-cmn_Hant
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 123140
num_examples: 1470
- name: validation
num_bytes: 323955
num_examples: 3792
download_size: 232279
dataset_size: 447095
- config_name: spa-spa
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 260912
num_examples: 2499
- name: validation
num_bytes: 270460
num_examples: 2565
download_size: 305202
dataset_size: 531372
- config_name: spa-swe
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 106015
num_examples: 1350
- name: validation
num_bytes: 16628
num_examples: 203
download_size: 71178
dataset_size: 122643
- config_name: spa-tat
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 36208
num_examples: 440
download_size: 18955
dataset_size: 36208
- config_name: spa-tgl
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 59245
num_examples: 630
download_size: 34203
dataset_size: 59245
- config_name: spa-tlh
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 24131
num_examples: 324
download_size: 14722
dataset_size: 24131
- config_name: spa-toki
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 53576
num_examples: 639
- name: validation
num_bytes: 4230
num_examples: 30
download_size: 31345
dataset_size: 57806
- config_name: spa-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 870663
num_examples: 10614
- name: validation
num_bytes: 1481421
num_examples: 18212
download_size: 1143903
dataset_size: 2352084
- config_name: spa-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 938715
num_examples: 10114
- name: validation
num_bytes: 1207319
num_examples: 12968
download_size: 964837
dataset_size: 2146034
- config_name: spa-vie
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 53544
num_examples: 593
download_size: 30871
dataset_size: 53544
- config_name: spa-yid
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 42306
num_examples: 406
- name: validation
num_bytes: 5071
num_examples: 42
download_size: 25398
dataset_size: 47377
- config_name: spa-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 228258
num_examples: 2704
- name: validation
num_bytes: 592566
num_examples: 6921
download_size: 423632
dataset_size: 820824
- config_name: srp_Cyrl-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 109062
num_examples: 880
- name: validation
num_bytes: 183387
num_examples: 1409
download_size: 141468
dataset_size: 292449
- config_name: srp_Cyrl-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 21290
num_examples: 204
download_size: 12024
dataset_size: 21290
- config_name: srp_Latn-ita
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 16809
num_examples: 211
download_size: 11288
dataset_size: 16809
- config_name: srp_Latn-nob
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 517529
num_examples: 4812
- name: validation
num_bytes: 621219
num_examples: 5941
download_size: 628321
dataset_size: 1138748
- config_name: srp_Latn-rus
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 144734
num_examples: 1482
- name: validation
num_bytes: 239333
num_examples: 2501
download_size: 183176
dataset_size: 384067
- config_name: srp_Latn-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 29098
num_examples: 347
download_size: 16225
dataset_size: 29098
- config_name: swe-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 38998
num_examples: 477
download_size: 20845
dataset_size: 38998
- config_name: swe-cmn_Hant
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 41792
num_examples: 566
download_size: 20980
dataset_size: 41792
- config_name: swe-swe
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 84497
num_examples: 1021
download_size: 44545
dataset_size: 84497
- config_name: swe-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 16475
num_examples: 202
download_size: 12269
dataset_size: 16475
- config_name: swe-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 81227
num_examples: 1048
download_size: 39832
dataset_size: 81227
- config_name: tat-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 53031
num_examples: 520
download_size: 29374
dataset_size: 53031
- config_name: tat-vie
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 33750
num_examples: 283
download_size: 19328
dataset_size: 33750
- config_name: tlh-yid
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 26514
num_examples: 288
- name: validation
num_bytes: 4547
num_examples: 48
download_size: 15612
dataset_size: 31061
- config_name: tlh-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 36459
num_examples: 447
download_size: 20649
dataset_size: 36459
- config_name: tlh_Latn-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 20461
num_examples: 245
download_size: 13014
dataset_size: 20461
- config_name: tlh_Latn-cmn_Hant
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 15887
num_examples: 201
download_size: 10318
dataset_size: 15887
- config_name: tlh_Latn-yid
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 26452
num_examples: 287
download_size: 11465
dataset_size: 26452
- config_name: tur-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 47816
num_examples: 584
download_size: 26071
dataset_size: 47816
- config_name: tur-cmn_Hant
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 30886
num_examples: 395
download_size: 17961
dataset_size: 30886
- config_name: tur-tur
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 211897
num_examples: 2499
- name: validation
num_bytes: 101055
num_examples: 1186
download_size: 163239
dataset_size: 312952
- config_name: tur-uig
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 166790
num_examples: 1399
- name: validation
num_bytes: 121613
num_examples: 1010
download_size: 145651
dataset_size: 288403
- config_name: tur-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 222524
num_examples: 2519
- name: validation
num_bytes: 591068
num_examples: 6721
download_size: 372843
dataset_size: 813592
- config_name: tur-uzb
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 16788
num_examples: 207
download_size: 11413
dataset_size: 16788
- config_name: tur-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 79096
num_examples: 985
download_size: 40582
dataset_size: 79096
- config_name: uig-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 229949
num_examples: 1928
download_size: 103743
dataset_size: 229949
- config_name: uig_Arab-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 167895
num_examples: 1382
download_size: 76890
dataset_size: 167895
- config_name: uig_Arab-cmn_Hant
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 61655
num_examples: 541
download_size: 29982
dataset_size: 61655
- config_name: ukr-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 89627
num_examples: 852
download_size: 45641
dataset_size: 89627
- config_name: ukr-cmn_Hant
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 48358
num_examples: 529
download_size: 24950
dataset_size: 48358
- config_name: ukr-ukr
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 82966
num_examples: 823
download_size: 38791
dataset_size: 82966
- config_name: ukr-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 157543
num_examples: 1574
download_size: 78523
dataset_size: 157543
- config_name: vie-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 42746
num_examples: 344
download_size: 25397
dataset_size: 42746
- config_name: vie-vie
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 75096
num_examples: 541
download_size: 37416
dataset_size: 75096
- config_name: vie-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 50886
num_examples: 439
download_size: 29158
dataset_size: 50886
- config_name: wuu-cmn_Hans
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 87844
num_examples: 872
- name: validation
num_bytes: 179924
num_examples: 1775
download_size: 159352
dataset_size: 267768
- config_name: yid-yid
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 39189
num_examples: 291
download_size: 17085
dataset_size: 39189
- config_name: zho-zho
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 301592
num_examples: 3113
- name: validation
num_bytes: 572569
num_examples: 5809
download_size: 508839
dataset_size: 874161
- config_name: zsm_Latn-ind
features:
- name: sourceLang
dtype: string
- name: targetlang
dtype: string
- name: sourceString
dtype: string
- name: targetString
dtype: string
splits:
- name: test
num_bytes: 26133
num_examples: 224
download_size: 17764
dataset_size: 26133
configs:
- config_name: afr-deu
data_files:
- split: test
path: afr-deu/test-*
- config_name: afr-eng
data_files:
- split: test
path: afr-eng/test-*
- split: validation
path: afr-eng/validation-*
- config_name: afr-epo
data_files:
- split: test
path: afr-epo/test-*
- config_name: afr-nld
data_files:
- split: test
path: afr-nld/test-*
- config_name: afr-rus
data_files:
- split: test
path: afr-rus/test-*
- config_name: afr-spa
data_files:
- split: test
path: afr-spa/test-*
- config_name: ain-fin
data_files:
- split: test
path: ain-fin/test-*
- config_name: ara-ber
data_files:
- split: test
path: ara-ber/test-*
- split: validation
path: ara-ber/validation-*
- config_name: ara-ber_Latn
data_files:
- split: test
path: ara-ber_Latn/test-*
- split: validation
path: ara-ber_Latn/validation-*
- config_name: ara-deu
data_files:
- split: test
path: ara-deu/test-*
- split: validation
path: ara-deu/validation-*
- config_name: ara-ell
data_files:
- split: test
path: ara-ell/test-*
- config_name: ara-eng
data_files:
- split: test
path: ara-eng/test-*
- split: validation
path: ara-eng/validation-*
- config_name: ara-epo
data_files:
- split: test
path: ara-epo/test-*
- split: validation
path: ara-epo/validation-*
- config_name: ara-fra
data_files:
- split: test
path: ara-fra/test-*
- split: validation
path: ara-fra/validation-*
- config_name: ara-heb
data_files:
- split: test
path: ara-heb/test-*
- split: validation
path: ara-heb/validation-*
- config_name: ara-ita
data_files:
- split: test
path: ara-ita/test-*
- config_name: ara-jpn
data_files:
- split: test
path: ara-jpn/test-*
- config_name: ara-jpn_Hira
data_files:
- split: test
path: ara-jpn_Hira/test-*
- config_name: ara-pol
data_files:
- split: test
path: ara-pol/test-*
- config_name: ara-rus
data_files:
- split: test
path: ara-rus/test-*
- split: validation
path: ara-rus/validation-*
- config_name: ara-spa
data_files:
- split: test
path: ara-spa/test-*
- split: validation
path: ara-spa/validation-*
- config_name: ara-tur
data_files:
- split: test
path: ara-tur/test-*
- split: validation
path: ara-tur/validation-*
- config_name: arq-eng
data_files:
- split: test
path: arq-eng/test-*
- split: validation
path: arq-eng/validation-*
- config_name: avk-fra
data_files:
- split: test
path: avk-fra/test-*
- split: validation
path: avk-fra/validation-*
- config_name: avk-spa
data_files:
- split: test
path: avk-spa/test-*
- config_name: awa-eng
data_files:
- split: test
path: awa-eng/test-*
- config_name: aze-eng
data_files:
- split: test
path: aze-eng/test-*
- split: validation
path: aze-eng/validation-*
- config_name: aze-spa
data_files:
- split: test
path: aze-spa/test-*
- config_name: aze-tur
data_files:
- split: test
path: aze-tur/test-*
- config_name: aze_Latn-tur
data_files:
- split: test
path: aze_Latn-tur/test-*
- config_name: bel-deu
data_files:
- split: test
path: bel-deu/test-*
- config_name: bel-eng
data_files:
- split: test
path: bel-eng/test-*
- split: validation
path: bel-eng/validation-*
- config_name: bel-epo
data_files:
- split: test
path: bel-epo/test-*
- config_name: bel-fra
data_files:
- split: test
path: bel-fra/test-*
- config_name: bel-ita
data_files:
- split: test
path: bel-ita/test-*
- config_name: bel-lat
data_files:
- split: test
path: bel-lat/test-*
- config_name: bel-nld
data_files:
- split: test
path: bel-nld/test-*
- config_name: bel-pol
data_files:
- split: test
path: bel-pol/test-*
- config_name: bel-rus
data_files:
- split: test
path: bel-rus/test-*
- split: validation
path: bel-rus/validation-*
- config_name: bel-spa
data_files:
- split: test
path: bel-spa/test-*
- config_name: bel-ukr
data_files:
- split: test
path: bel-ukr/test-*
- split: validation
path: bel-ukr/validation-*
- config_name: bel-zho
data_files:
- split: test
path: bel-zho/test-*
- config_name: ben-eng
data_files:
- split: test
path: ben-eng/test-*
- split: validation
path: ben-eng/validation-*
- config_name: ber-deu
data_files:
- split: test
path: ber-deu/test-*
- config_name: ber-eng
data_files:
- split: test
path: ber-eng/test-*
- split: validation
path: ber-eng/validation-*
- config_name: ber-epo
data_files:
- split: test
path: ber-epo/test-*
- config_name: ber-fra
data_files:
- split: test
path: ber-fra/test-*
- split: validation
path: ber-fra/validation-*
- config_name: ber-spa
data_files:
- split: test
path: ber-spa/test-*
- split: validation
path: ber-spa/validation-*
- config_name: ber_Latn-deu
data_files:
- split: test
path: ber_Latn-deu/test-*
- config_name: ber_Latn-eng
data_files:
- split: test
path: ber_Latn-eng/test-*
- split: validation
path: ber_Latn-eng/validation-*
- config_name: ber_Latn-epo
data_files:
- split: test
path: ber_Latn-epo/test-*
- config_name: ber_Latn-fra
data_files:
- split: test
path: ber_Latn-fra/test-*
- split: validation
path: ber_Latn-fra/validation-*
- config_name: bre-eng
data_files:
- split: test
path: bre-eng/test-*
- config_name: bre-fra
data_files:
- split: test
path: bre-fra/test-*
- split: validation
path: bre-fra/validation-*
- config_name: bua-rus
data_files:
- split: test
path: bua-rus/test-*
- config_name: bua_Cyrl-rus
data_files:
- split: test
path: bua_Cyrl-rus/test-*
- config_name: bul-bul
data_files:
- split: test
path: bul-bul/test-*
- config_name: bul-cmn_Hans
data_files:
- split: test
path: bul-cmn_Hans/test-*
- config_name: bul-deu
data_files:
- split: test
path: bul-deu/test-*
- config_name: bul-eng
data_files:
- split: test
path: bul-eng/test-*
- split: validation
path: bul-eng/validation-*
- config_name: bul-epo
data_files:
- split: test
path: bul-epo/test-*
- config_name: bul-fra
data_files:
- split: test
path: bul-fra/test-*
- config_name: bul-ita
data_files:
- split: test
path: bul-ita/test-*
- split: validation
path: bul-ita/validation-*
- config_name: bul-jpn
data_files:
- split: test
path: bul-jpn/test-*
- config_name: bul-jpn_Hira
data_files:
- split: test
path: bul-jpn_Hira/test-*
- config_name: bul-rus
data_files:
- split: test
path: bul-rus/test-*
- split: validation
path: bul-rus/validation-*
- config_name: bul-spa
data_files:
- split: test
path: bul-spa/test-*
- config_name: bul-tur
data_files:
- split: test
path: bul-tur/test-*
- config_name: bul-ukr
data_files:
- split: test
path: bul-ukr/test-*
- config_name: bul-zho
data_files:
- split: test
path: bul-zho/test-*
- config_name: cat-deu
data_files:
- split: test
path: cat-deu/test-*
- config_name: cat-eng
data_files:
- split: test
path: cat-eng/test-*
- split: validation
path: cat-eng/validation-*
- config_name: cat-epo
data_files:
- split: test
path: cat-epo/test-*
- config_name: cat-fra
data_files:
- split: test
path: cat-fra/test-*
- config_name: cat-ita
data_files:
- split: test
path: cat-ita/test-*
- config_name: cat-nld
data_files:
- split: test
path: cat-nld/test-*
- config_name: cat-por
data_files:
- split: test
path: cat-por/test-*
- config_name: cat-spa
data_files:
- split: test
path: cat-spa/test-*
- split: validation
path: cat-spa/validation-*
- config_name: cat-ukr
data_files:
- split: test
path: cat-ukr/test-*
- config_name: cbk-eng
data_files:
- split: test
path: cbk-eng/test-*
- split: validation
path: cbk-eng/validation-*
- config_name: ceb-deu
data_files:
- split: test
path: ceb-deu/test-*
- config_name: ceb-eng
data_files:
- split: test
path: ceb-eng/test-*
- config_name: ces-deu
data_files:
- split: test
path: ces-deu/test-*
- split: validation
path: ces-deu/validation-*
- config_name: ces-eng
data_files:
- split: test
path: ces-eng/test-*
- split: validation
path: ces-eng/validation-*
- config_name: ces-epo
data_files:
- split: test
path: ces-epo/test-*
- split: validation
path: ces-epo/validation-*
- config_name: ces-fra
data_files:
- split: test
path: ces-fra/test-*
- config_name: ces-hun
data_files:
- split: test
path: ces-hun/test-*
- split: validation
path: ces-hun/validation-*
- config_name: ces-ita
data_files:
- split: test
path: ces-ita/test-*
- config_name: ces-lat
data_files:
- split: test
path: ces-lat/test-*
- config_name: ces-pol
data_files:
- split: test
path: ces-pol/test-*
- config_name: ces-rus
data_files:
- split: test
path: ces-rus/test-*
- split: validation
path: ces-rus/validation-*
- config_name: ces-slv
data_files:
- split: test
path: ces-slv/test-*
- config_name: ces-ukr
data_files:
- split: test
path: ces-ukr/test-*
- split: validation
path: ces-ukr/validation-*
- config_name: cha-eng
data_files:
- split: test
path: cha-eng/test-*
- config_name: chm-rus
data_files:
- split: test
path: chm-rus/test-*
- split: validation
path: chm-rus/validation-*
- config_name: chv-eng
data_files:
- split: test
path: chv-eng/test-*
- config_name: chv-rus
data_files:
- split: test
path: chv-rus/test-*
- config_name: chv-tur
data_files:
- split: test
path: chv-tur/test-*
- config_name: cmn_Hans-wuu
data_files:
- split: test
path: cmn_Hans-wuu/test-*
- split: validation
path: cmn_Hans-wuu/validation-*
- config_name: cor-deu
data_files:
- split: test
path: cor-deu/test-*
- config_name: cor-eng
data_files:
- split: test
path: cor-eng/test-*
- split: validation
path: cor-eng/validation-*
- config_name: cor-epo
data_files:
- split: test
path: cor-epo/test-*
- config_name: cor-fra
data_files:
- split: test
path: cor-fra/test-*
- config_name: cor-ita
data_files:
- split: test
path: cor-ita/test-*
- config_name: cor-rus
data_files:
- split: test
path: cor-rus/test-*
- config_name: cor-spa
data_files:
- split: test
path: cor-spa/test-*
- config_name: crh-tur
data_files:
- split: test
path: crh-tur/test-*
- config_name: cym-eng
data_files:
- split: test
path: cym-eng/test-*
- config_name: dan-dan
data_files:
- split: test
path: dan-dan/test-*
- config_name: dan-deu
data_files:
- split: test
path: dan-deu/test-*
- split: validation
path: dan-deu/validation-*
- config_name: dan-eng
data_files:
- split: test
path: dan-eng/test-*
- split: validation
path: dan-eng/validation-*
- config_name: dan-fin
data_files:
- split: test
path: dan-fin/test-*
- split: validation
path: dan-fin/validation-*
- config_name: dan-fra
data_files:
- split: test
path: dan-fra/test-*
- split: validation
path: dan-fra/validation-*
- config_name: dan-ita
data_files:
- split: test
path: dan-ita/test-*
- config_name: dan-jpn
data_files:
- split: test
path: dan-jpn/test-*
- config_name: dan-jpn_Hira
data_files:
- split: test
path: dan-jpn_Hira/test-*
- config_name: dan-nld
data_files:
- split: test
path: dan-nld/test-*
- split: validation
path: dan-nld/validation-*
- config_name: dan-nob
data_files:
- split: test
path: dan-nob/test-*
- split: validation
path: dan-nob/validation-*
- config_name: dan-nor
data_files:
- split: test
path: dan-nor/test-*
- split: validation
path: dan-nor/validation-*
- config_name: dan-por
data_files:
- split: test
path: dan-por/test-*
- config_name: dan-rus
data_files:
- split: test
path: dan-rus/test-*
- split: validation
path: dan-rus/validation-*
- config_name: dan-spa
data_files:
- split: test
path: dan-spa/test-*
- split: validation
path: dan-spa/validation-*
- config_name: dan-swe
data_files:
- split: test
path: dan-swe/test-*
- split: validation
path: dan-swe/validation-*
- config_name: dan-tur
data_files:
- split: test
path: dan-tur/test-*
- config_name: deu-cmn_Hans
data_files:
- split: test
path: deu-cmn_Hans/test-*
- split: validation
path: deu-cmn_Hans/validation-*
- config_name: deu-cmn_Hant
data_files:
- split: test
path: deu-cmn_Hant/test-*
- split: validation
path: deu-cmn_Hant/validation-*
- config_name: deu-deu
data_files:
- split: test
path: deu-deu/test-*
- split: validation
path: deu-deu/validation-*
- config_name: deu-dsb
data_files:
- split: test
path: deu-dsb/test-*
- config_name: ces-spa
data_files:
- split: test
path: ces-spa/test-*
- split: validation
path: ces-spa/validation-*
- config_name: dan-epo
data_files:
- split: test
path: dan-epo/test-*
- split: validation
path: dan-epo/validation-*
- config_name: deu-ell
data_files:
- split: test
path: deu-ell/test-*
- split: validation
path: deu-ell/validation-*
- config_name: deu-eng
data_files:
- split: test
path: deu-eng/test-*
- split: validation
path: deu-eng/validation-*
- config_name: deu-est
data_files:
- split: test
path: deu-est/test-*
- config_name: deu-eus
data_files:
- split: test
path: deu-eus/test-*
- config_name: deu-fas
data_files:
- split: test
path: deu-fas/test-*
- split: validation
path: deu-fas/validation-*
- config_name: deu-fin
data_files:
- split: test
path: deu-fin/test-*
- split: validation
path: deu-fin/validation-*
- config_name: deu-fra
data_files:
- split: test
path: deu-fra/test-*
- split: validation
path: deu-fra/validation-*
- config_name: deu-frr
data_files:
- split: test
path: deu-frr/test-*
- config_name: deu-gos
data_files:
- split: test
path: deu-gos/test-*
- config_name: deu-hbs
data_files:
- split: test
path: deu-hbs/test-*
- config_name: deu-heb
data_files:
- split: test
path: deu-heb/test-*
- split: validation
path: deu-heb/validation-*
- config_name: deu-hrv
data_files:
- split: test
path: deu-hrv/test-*
- config_name: deu-hrx
data_files:
- split: test
path: deu-hrx/test-*
- config_name: deu-hsb
data_files:
- split: test
path: deu-hsb/test-*
- config_name: deu-hun
data_files:
- split: test
path: deu-hun/test-*
- split: validation
path: deu-hun/validation-*
- config_name: deu-ido
data_files:
- split: test
path: deu-ido/test-*
- split: validation
path: deu-ido/validation-*
- config_name: deu-ile
data_files:
- split: test
path: deu-ile/test-*
- split: validation
path: deu-ile/validation-*
- config_name: deu-ina
data_files:
- split: test
path: deu-ina/test-*
- split: validation
path: deu-ina/validation-*
- config_name: deu-ind
data_files:
- split: test
path: deu-ind/test-*
- config_name: deu-isl
data_files:
- split: test
path: deu-isl/test-*
- config_name: deu-ita
data_files:
- split: test
path: deu-ita/test-*
- split: validation
path: deu-ita/validation-*
- config_name: deu-jbo
data_files:
- split: test
path: deu-jbo/test-*
- config_name: deu-jpn
data_files:
- split: test
path: deu-jpn/test-*
- split: validation
path: deu-jpn/validation-*
- config_name: deu-jpn_Hani
data_files:
- split: test
path: deu-jpn_Hani/test-*
- split: validation
path: deu-jpn_Hani/validation-*
- config_name: deu-jpn_Hira
data_files:
- split: test
path: deu-jpn_Hira/test-*
- split: validation
path: deu-jpn_Hira/validation-*
- config_name: deu-jpn_Kana
data_files:
- split: test
path: deu-jpn_Kana/test-*
- split: validation
path: deu-jpn_Kana/validation-*
- config_name: deu-kab
data_files:
- split: test
path: deu-kab/test-*
- config_name: deu-kor
data_files:
- split: test
path: deu-kor/test-*
- config_name: deu-kor_Hang
data_files:
- split: test
path: deu-kor_Hang/test-*
- config_name: deu-kur
data_files:
- split: test
path: deu-kur/test-*
- config_name: deu-kur_Latn
data_files:
- split: test
path: deu-kur_Latn/test-*
- config_name: deu-lad
data_files:
- split: test
path: deu-lad/test-*
- split: validation
path: deu-lad/validation-*
- config_name: deu-lat
data_files:
- split: test
path: deu-lat/test-*
- split: validation
path: deu-lat/validation-*
- config_name: deu-lfn
data_files:
- split: test
path: deu-lfn/test-*
- split: validation
path: deu-lfn/validation-*
- config_name: deu-lfn_Latn
data_files:
- split: test
path: deu-lfn_Latn/test-*
- split: validation
path: deu-lfn_Latn/validation-*
- config_name: deu-lit
data_files:
- split: test
path: deu-lit/test-*
- split: validation
path: deu-lit/validation-*
- config_name: deu-ltz
data_files:
- split: test
path: deu-ltz/test-*
- config_name: deu-msa
data_files:
- split: test
path: deu-msa/test-*
- config_name: deu-nds
data_files:
- split: test
path: deu-nds/test-*
- split: validation
path: deu-nds/validation-*
- config_name: deu-nld
data_files:
- split: test
path: deu-nld/test-*
- split: validation
path: deu-nld/validation-*
- config_name: deu-nob
data_files:
- split: test
path: deu-nob/test-*
- split: validation
path: deu-nob/validation-*
- config_name: deu-nor
data_files:
- split: test
path: deu-nor/test-*
- split: validation
path: deu-nor/validation-*
- config_name: deu-pol
data_files:
- split: test
path: deu-pol/test-*
- split: validation
path: deu-pol/validation-*
- config_name: deu-por
data_files:
- split: test
path: deu-por/test-*
- split: validation
path: deu-por/validation-*
- config_name: deu-ron
data_files:
- split: test
path: deu-ron/test-*
- split: validation
path: deu-ron/validation-*
- config_name: deu-run
data_files:
- split: test
path: deu-run/test-*
- split: validation
path: deu-run/validation-*
- config_name: deu-rus
data_files:
- split: test
path: deu-rus/test-*
- split: validation
path: deu-rus/validation-*
- config_name: deu-slv
data_files:
- split: test
path: deu-slv/test-*
- config_name: deu-spa
data_files:
- split: test
path: deu-spa/test-*
- split: validation
path: deu-spa/validation-*
- config_name: deu-srp_Latn
data_files:
- split: test
path: deu-srp_Latn/test-*
- config_name: deu-swe
data_files:
- split: test
path: deu-swe/test-*
- split: validation
path: deu-swe/validation-*
- config_name: deu-swg
data_files:
- split: test
path: deu-swg/test-*
- split: validation
path: deu-swg/validation-*
- config_name: deu-tat
data_files:
- split: test
path: deu-tat/test-*
- split: validation
path: deu-tat/validation-*
- config_name: deu-tgl
data_files:
- split: test
path: deu-tgl/test-*
- config_name: deu-tlh
data_files:
- split: test
path: deu-tlh/test-*
- split: validation
path: deu-tlh/validation-*
- config_name: deu-toki
data_files:
- split: test
path: deu-toki/test-*
- split: validation
path: deu-toki/validation-*
- config_name: deu-tur
data_files:
- split: test
path: deu-tur/test-*
- split: validation
path: deu-tur/validation-*
- config_name: deu-ukr
data_files:
- split: test
path: deu-ukr/test-*
- split: validation
path: deu-ukr/validation-*
- config_name: deu-vie
data_files:
- split: test
path: deu-vie/test-*
- config_name: deu-vol
data_files:
- split: test
path: deu-vol/test-*
- config_name: deu-yid
data_files:
- split: test
path: deu-yid/test-*
- split: validation
path: deu-yid/validation-*
- config_name: deu-zho
data_files:
- split: test
path: deu-zho/test-*
- split: validation
path: deu-zho/validation-*
- config_name: dsb-hsb
data_files:
- split: test
path: dsb-hsb/test-*
- config_name: dsb-slv
data_files:
- split: test
path: dsb-slv/test-*
- config_name: dtp-eng
data_files:
- split: test
path: dtp-eng/test-*
- split: validation
path: dtp-eng/validation-*
- config_name: dtp-jpn
data_files:
- split: test
path: dtp-jpn/test-*
- config_name: dtp-jpn_Hira
data_files:
- split: test
path: dtp-jpn_Hira/test-*
- config_name: dtp-msa
data_files:
- split: test
path: dtp-msa/test-*
- config_name: dtp-zsm_Latn
data_files:
- split: test
path: dtp-zsm_Latn/test-*
- config_name: egl-ita
data_files:
- split: test
path: egl-ita/test-*
- config_name: ell-ell
data_files:
- split: test
path: ell-ell/test-*
- config_name: ell-eng
data_files:
- split: test
path: ell-eng/test-*
- split: validation
path: ell-eng/validation-*
- config_name: ell-epo
data_files:
- split: test
path: ell-epo/test-*
- config_name: ell-fra
data_files:
- split: test
path: ell-fra/test-*
- config_name: ell-ita
data_files:
- split: test
path: ell-ita/test-*
- config_name: ell-nld
data_files:
- split: test
path: ell-nld/test-*
- split: validation
path: ell-nld/validation-*
- config_name: ell-por
data_files:
- split: test
path: ell-por/test-*
- split: validation
path: ell-por/validation-*
- config_name: ell-rus
data_files:
- split: test
path: ell-rus/test-*
- split: validation
path: ell-rus/validation-*
- config_name: ell-spa
data_files:
- split: test
path: ell-spa/test-*
- split: validation
path: ell-spa/validation-*
- config_name: ell-swe
data_files:
- split: test
path: ell-swe/test-*
- config_name: ell-tur
data_files:
- split: test
path: ell-tur/test-*
- split: validation
path: ell-tur/validation-*
- config_name: eng-bos_Latn
data_files:
- split: test
path: eng-bos_Latn/test-*
- split: validation
path: eng-bos_Latn/validation-*
- config_name: eng-cmn_Hans
data_files:
- split: test
path: eng-cmn_Hans/test-*
- split: validation
path: eng-cmn_Hans/validation-*
- config_name: eng-cmn_Hant
data_files:
- split: test
path: eng-cmn_Hant/test-*
- split: validation
path: eng-cmn_Hant/validation-*
- config_name: eng-eng
data_files:
- split: test
path: eng-eng/test-*
- split: validation
path: eng-eng/validation-*
- config_name: eng-est
data_files:
- split: test
path: eng-est/test-*
- split: validation
path: eng-est/validation-*
- config_name: eng-eus
data_files:
- split: test
path: eng-eus/test-*
- split: validation
path: eng-eus/validation-*
- config_name: eng-fao
data_files:
- split: test
path: eng-fao/test-*
- config_name: eng-fas
data_files:
- split: test
path: eng-fas/test-*
- split: validation
path: eng-fas/validation-*
- config_name: eng-fin
data_files:
- split: test
path: eng-fin/test-*
- split: validation
path: eng-fin/validation-*
- config_name: eng-fra
data_files:
- split: test
path: eng-fra/test-*
- split: validation
path: eng-fra/validation-*
- config_name: eng-fry
data_files:
- split: test
path: eng-fry/test-*
- config_name: eng-gla
data_files:
- split: test
path: eng-gla/test-*
- config_name: eng-gle
data_files:
- split: test
path: eng-gle/test-*
- split: validation
path: eng-gle/validation-*
- config_name: eng-glg
data_files:
- split: test
path: eng-glg/test-*
- config_name: eng-gos
data_files:
- split: test
path: eng-gos/test-*
- split: validation
path: eng-gos/validation-*
- config_name: eng-got
data_files:
- split: test
path: eng-got/test-*
- config_name: eng-grc
data_files:
- split: test
path: eng-grc/test-*
- config_name: eng-gsw
data_files:
- split: test
path: eng-gsw/test-*
- config_name: eng-hbs
data_files:
- split: test
path: eng-hbs/test-*
- split: validation
path: eng-hbs/validation-*
- config_name: eng-heb
data_files:
- split: test
path: eng-heb/test-*
- split: validation
path: eng-heb/validation-*
- config_name: eng-hin
data_files:
- split: test
path: eng-hin/test-*
- split: validation
path: eng-hin/validation-*
- config_name: eng-hoc
data_files:
- split: test
path: eng-hoc/test-*
- config_name: eng-hoc_Latn
data_files:
- split: test
path: eng-hoc_Latn/test-*
- config_name: eng-hrv
data_files:
- split: test
path: eng-hrv/test-*
- split: validation
path: eng-hrv/validation-*
- config_name: eng-hrx
data_files:
- split: test
path: eng-hrx/test-*
- config_name: eng-hun
data_files:
- split: test
path: eng-hun/test-*
- split: validation
path: eng-hun/validation-*
- config_name: eng-hye
data_files:
- split: test
path: eng-hye/test-*
- split: validation
path: eng-hye/validation-*
- config_name: eng-ido
data_files:
- split: test
path: eng-ido/test-*
- split: validation
path: eng-ido/validation-*
- config_name: eng-ido_Latn
data_files:
- split: test
path: eng-ido_Latn/test-*
- config_name: eng-ile
data_files:
- split: test
path: eng-ile/test-*
- split: validation
path: eng-ile/validation-*
- config_name: eng-ilo
data_files:
- split: test
path: eng-ilo/test-*
- split: validation
path: eng-ilo/validation-*
- config_name: eng-ina
data_files:
- split: test
path: eng-ina/test-*
- split: validation
path: eng-ina/validation-*
- config_name: eng-ind
data_files:
- split: test
path: eng-ind/test-*
- split: validation
path: eng-ind/validation-*
- config_name: eng-isl
data_files:
- split: test
path: eng-isl/test-*
- split: validation
path: eng-isl/validation-*
- config_name: eng-ita
data_files:
- split: test
path: eng-ita/test-*
- split: validation
path: eng-ita/validation-*
- config_name: eng-jav
data_files:
- split: test
path: eng-jav/test-*
- config_name: eng-jbo
data_files:
- split: test
path: eng-jbo/test-*
- split: validation
path: eng-jbo/validation-*
- config_name: eng-jbo_Latn
data_files:
- split: test
path: eng-jbo_Latn/test-*
- split: validation
path: eng-jbo_Latn/validation-*
- config_name: eng-jpn
data_files:
- split: test
path: eng-jpn/test-*
- split: validation
path: eng-jpn/validation-*
- config_name: eng-jpn_Hani
data_files:
- split: test
path: eng-jpn_Hani/test-*
- split: validation
path: eng-jpn_Hani/validation-*
- config_name: eng-jpn_Hira
data_files:
- split: test
path: eng-jpn_Hira/test-*
- split: validation
path: eng-jpn_Hira/validation-*
- config_name: eng-jpn_Kana
data_files:
- split: test
path: eng-jpn_Kana/test-*
- split: validation
path: eng-jpn_Kana/validation-*
- config_name: eng-kab
data_files:
- split: test
path: eng-kab/test-*
- split: validation
path: eng-kab/validation-*
- config_name: eng-kat
data_files:
- split: test
path: eng-kat/test-*
- config_name: eng-kaz
data_files:
- split: test
path: eng-kaz/test-*
- config_name: eng-kaz_Cyrl
data_files:
- split: test
path: eng-kaz_Cyrl/test-*
- config_name: eng-kha
data_files:
- split: test
path: eng-kha/test-*
- split: validation
path: eng-kha/validation-*
- config_name: eng-khm
data_files:
- split: test
path: eng-khm/test-*
- config_name: eng-kor
data_files:
- split: test
path: eng-kor/test-*
- split: validation
path: eng-kor/validation-*
- config_name: eng-kor_Hang
data_files:
- split: test
path: eng-kor_Hang/test-*
- split: validation
path: eng-kor_Hang/validation-*
- config_name: eng-kur
data_files:
- split: test
path: eng-kur/test-*
- split: validation
path: eng-kur/validation-*
- config_name: eng-kur_Latn
data_files:
- split: test
path: eng-kur_Latn/test-*
- config_name: eng-kzj
data_files:
- split: test
path: eng-kzj/test-*
- config_name: eng-lad
data_files:
- split: test
path: eng-lad/test-*
- split: validation
path: eng-lad/validation-*
- config_name: eng-lad_Latn
data_files:
- split: test
path: eng-lad_Latn/test-*
- split: validation
path: eng-lad_Latn/validation-*
- config_name: eng-lat
data_files:
- split: test
path: eng-lat/test-*
- split: validation
path: eng-lat/validation-*
- config_name: eng-lav
data_files:
- split: test
path: eng-lav/test-*
- config_name: eng-lfn
data_files:
- split: test
path: eng-lfn/test-*
- split: validation
path: eng-lfn/validation-*
- config_name: eng-lfn_Cyrl
data_files:
- split: test
path: eng-lfn_Cyrl/test-*
- split: validation
path: eng-lfn_Cyrl/validation-*
- config_name: eng-lfn_Latn
data_files:
- split: test
path: eng-lfn_Latn/test-*
- split: validation
path: eng-lfn_Latn/validation-*
- config_name: eng-lit
data_files:
- split: test
path: eng-lit/test-*
- split: validation
path: eng-lit/validation-*
- config_name: eng-ltz
data_files:
- split: test
path: eng-ltz/test-*
- config_name: eng-mal
data_files:
- split: test
path: eng-mal/test-*
- config_name: eng-mar
data_files:
- split: test
path: eng-mar/test-*
- split: validation
path: eng-mar/validation-*
- config_name: eng-mkd
data_files:
- split: test
path: eng-mkd/test-*
- split: validation
path: eng-mkd/validation-*
- config_name: eng-mlt
data_files:
- split: test
path: eng-mlt/test-*
- config_name: eng-mon
data_files:
- split: test
path: eng-mon/test-*
- split: validation
path: eng-mon/validation-*
- config_name: eng-mri
data_files:
- split: test
path: eng-mri/test-*
- config_name: eng-msa
data_files:
- split: test
path: eng-msa/test-*
- split: validation
path: eng-msa/validation-*
- config_name: eng-mya
data_files:
- split: test
path: eng-mya/test-*
- config_name: eng-nds
data_files:
- split: test
path: eng-nds/test-*
- split: validation
path: eng-nds/validation-*
- config_name: eng-nld
data_files:
- split: test
path: eng-nld/test-*
- split: validation
path: eng-nld/validation-*
- config_name: eng-nno
data_files:
- split: test
path: eng-nno/test-*
- split: validation
path: eng-nno/validation-*
- config_name: eng-nob
data_files:
- split: test
path: eng-nob/test-*
- split: validation
path: eng-nob/validation-*
- config_name: eng-nor
data_files:
- split: test
path: eng-nor/test-*
- split: validation
path: eng-nor/validation-*
- config_name: eng-nov
data_files:
- split: test
path: eng-nov/test-*
- config_name: eng-nst
data_files:
- split: test
path: eng-nst/test-*
- config_name: eng-oci
data_files:
- split: test
path: eng-oci/test-*
- config_name: eng-orv
data_files:
- split: test
path: eng-orv/test-*
- config_name: eng-ota
data_files:
- split: test
path: eng-ota/test-*
- config_name: eng-ota_Arab
data_files:
- split: test
path: eng-ota_Arab/test-*
- config_name: eng-ota_Latn
data_files:
- split: test
path: eng-ota_Latn/test-*
- config_name: eng-pam
data_files:
- split: test
path: eng-pam/test-*
- split: validation
path: eng-pam/validation-*
- config_name: eng-pes
data_files:
- split: test
path: eng-pes/test-*
- config_name: eng-pms
data_files:
- split: test
path: eng-pms/test-*
- config_name: eng-pol
data_files:
- split: test
path: eng-pol/test-*
- split: validation
path: eng-pol/validation-*
- config_name: eng-por
data_files:
- split: test
path: eng-por/test-*
- split: validation
path: eng-por/validation-*
- config_name: eng-prg
data_files:
- split: test
path: eng-prg/test-*
- config_name: eng-que
data_files:
- split: test
path: eng-que/test-*
- config_name: eng-rom
data_files:
- split: test
path: eng-rom/test-*
- config_name: eng-ron
data_files:
- split: test
path: eng-ron/test-*
- split: validation
path: eng-ron/validation-*
- config_name: eng-run
data_files:
- split: test
path: eng-run/test-*
- config_name: eng-rus
data_files:
- split: test
path: eng-rus/test-*
- split: validation
path: eng-rus/validation-*
- config_name: eng-slv
data_files:
- split: test
path: eng-slv/test-*
- split: validation
path: eng-slv/validation-*
- config_name: eng-spa
data_files:
- split: test
path: eng-spa/test-*
- split: validation
path: eng-spa/validation-*
- config_name: eng-sqi
data_files:
- split: test
path: eng-sqi/test-*
- config_name: eng-srp_Cyrl
data_files:
- split: test
path: eng-srp_Cyrl/test-*
- split: validation
path: eng-srp_Cyrl/validation-*
- config_name: eng-srp_Latn
data_files:
- split: test
path: eng-srp_Latn/test-*
- split: validation
path: eng-srp_Latn/validation-*
- config_name: eng-swa
data_files:
- split: test
path: eng-swa/test-*
- config_name: eng-swe
data_files:
- split: test
path: eng-swe/test-*
- split: validation
path: eng-swe/validation-*
- config_name: eng-tam
data_files:
- split: test
path: eng-tam/test-*
- config_name: eng-tat
data_files:
- split: test
path: eng-tat/test-*
- config_name: eng-tel
data_files:
- split: test
path: eng-tel/test-*
- config_name: eng-tgl
data_files:
- split: test
path: eng-tgl/test-*
- split: validation
path: eng-tgl/validation-*
- config_name: eng-tha
data_files:
- split: test
path: eng-tha/test-*
- config_name: eng-tlh
data_files:
- split: test
path: eng-tlh/test-*
- split: validation
path: eng-tlh/validation-*
- config_name: eng-toki
data_files:
- split: test
path: eng-toki/test-*
- split: validation
path: eng-toki/validation-*
- config_name: eng-tuk
data_files:
- split: test
path: eng-tuk/test-*
- split: validation
path: eng-tuk/validation-*
- config_name: eng-tuk_Latn
data_files:
- split: test
path: eng-tuk_Latn/test-*
- config_name: eng-tur
data_files:
- split: test
path: eng-tur/test-*
- split: validation
path: eng-tur/validation-*
- config_name: eng-tzl
data_files:
- split: test
path: eng-tzl/test-*
- split: validation
path: eng-tzl/validation-*
- config_name: eng-tzl_Latn
data_files:
- split: test
path: eng-tzl_Latn/test-*
- config_name: eng-uig
data_files:
- split: test
path: eng-uig/test-*
- split: validation
path: eng-uig/validation-*
- config_name: eng-uig_Arab
data_files:
- split: test
path: eng-uig_Arab/test-*
- split: validation
path: eng-uig_Arab/validation-*
- config_name: eng-ukr
data_files:
- split: test
path: eng-ukr/test-*
- split: validation
path: eng-ukr/validation-*
- config_name: eng-urd
data_files:
- split: test
path: eng-urd/test-*
- config_name: eng-uzb
data_files:
- split: test
path: eng-uzb/test-*
- config_name: eng-uzb_Latn
data_files:
- split: test
path: eng-uzb_Latn/test-*
- config_name: eng-vie
data_files:
- split: test
path: eng-vie/test-*
- split: validation
path: eng-vie/validation-*
- config_name: eng-vol
data_files:
- split: test
path: eng-vol/test-*
- split: validation
path: eng-vol/validation-*
- config_name: eng-war
data_files:
- split: test
path: eng-war/test-*
- config_name: eng-xal
data_files:
- split: test
path: eng-xal/test-*
- config_name: eng-yid
data_files:
- split: test
path: eng-yid/test-*
- split: validation
path: eng-yid/validation-*
- config_name: eng-yue_Hans
data_files:
- split: test
path: eng-yue_Hans/test-*
- split: validation
path: eng-yue_Hans/validation-*
- config_name: eng-yue_Hant
data_files:
- split: test
path: eng-yue_Hant/test-*
- split: validation
path: eng-yue_Hant/validation-*
- config_name: eng-zho
data_files:
- split: test
path: eng-zho/test-*
- split: validation
path: eng-zho/validation-*
- config_name: eng-zsm_Latn
data_files:
- split: test
path: eng-zsm_Latn/test-*
- split: validation
path: eng-zsm_Latn/validation-*
- config_name: eng-zza
data_files:
- split: test
path: eng-zza/test-*
- config_name: epo-cmn_Hans
data_files:
- split: test
path: epo-cmn_Hans/test-*
- split: validation
path: epo-cmn_Hans/validation-*
- config_name: epo-cmn_Hant
data_files:
- split: test
path: epo-cmn_Hant/test-*
- split: validation
path: epo-cmn_Hant/validation-*
- config_name: epo-epo
data_files:
- split: test
path: epo-epo/test-*
- split: validation
path: epo-epo/validation-*
- config_name: epo-fas
data_files:
- split: test
path: epo-fas/test-*
- split: validation
path: epo-fas/validation-*
- config_name: epo-fin
data_files:
- split: test
path: epo-fin/test-*
- split: validation
path: epo-fin/validation-*
- config_name: epo-fra
data_files:
- split: test
path: epo-fra/test-*
- split: validation
path: epo-fra/validation-*
- config_name: epo-glg
data_files:
- split: test
path: epo-glg/test-*
- config_name: epo-hbs
data_files:
- split: test
path: epo-hbs/test-*
- split: validation
path: epo-hbs/validation-*
- config_name: epo-heb
data_files:
- split: test
path: epo-heb/test-*
- split: validation
path: epo-heb/validation-*
- config_name: epo-hrv
data_files:
- split: test
path: epo-hrv/test-*
- split: validation
path: epo-hrv/validation-*
- config_name: epo-hun
data_files:
- split: test
path: epo-hun/test-*
- split: validation
path: epo-hun/validation-*
- config_name: epo-ido
data_files:
- split: test
path: epo-ido/test-*
- split: validation
path: epo-ido/validation-*
- config_name: epo-ile
data_files:
- split: test
path: epo-ile/test-*
- split: validation
path: epo-ile/validation-*
- config_name: epo-ile_Latn
data_files:
- split: test
path: epo-ile_Latn/test-*
- config_name: epo-ina
data_files:
- split: test
path: epo-ina/test-*
- split: validation
path: epo-ina/validation-*
- config_name: epo-isl
data_files:
- split: test
path: epo-isl/test-*
- config_name: epo-ita
data_files:
- split: test
path: epo-ita/test-*
- split: validation
path: epo-ita/validation-*
- config_name: epo-jbo
data_files:
- split: test
path: epo-jbo/test-*
- config_name: epo-jpn
data_files:
- split: test
path: epo-jpn/test-*
- split: validation
path: epo-jpn/validation-*
- config_name: epo-jpn_Hani
data_files:
- split: test
path: epo-jpn_Hani/test-*
- split: validation
path: epo-jpn_Hani/validation-*
- config_name: epo-jpn_Hira
data_files:
- split: test
path: epo-jpn_Hira/test-*
- split: validation
path: epo-jpn_Hira/validation-*
- config_name: epo-lad
data_files:
- split: test
path: epo-lad/test-*
- split: validation
path: epo-lad/validation-*
- config_name: epo-lad_Latn
data_files:
- split: test
path: epo-lad_Latn/test-*
- split: validation
path: epo-lad_Latn/validation-*
- config_name: epo-lat
data_files:
- split: test
path: epo-lat/test-*
- split: validation
path: epo-lat/validation-*
- config_name: epo-lfn
data_files:
- split: test
path: epo-lfn/test-*
- split: validation
path: epo-lfn/validation-*
- config_name: epo-lfn_Latn
data_files:
- split: test
path: epo-lfn_Latn/test-*
- split: validation
path: epo-lfn_Latn/validation-*
- config_name: epo-lit
data_files:
- split: test
path: epo-lit/test-*
- split: validation
path: epo-lit/validation-*
- config_name: epo-nds
data_files:
- split: test
path: epo-nds/test-*
- split: validation
path: epo-nds/validation-*
- config_name: epo-nld
data_files:
- split: test
path: epo-nld/test-*
- split: validation
path: epo-nld/validation-*
- config_name: epo-nob
data_files:
- split: test
path: epo-nob/test-*
- split: validation
path: epo-nob/validation-*
- config_name: epo-nor
data_files:
- split: test
path: epo-nor/test-*
- split: validation
path: epo-nor/validation-*
- config_name: epo-oci
data_files:
- split: test
path: epo-oci/test-*
- config_name: epo-pol
data_files:
- split: test
path: epo-pol/test-*
- split: validation
path: epo-pol/validation-*
- config_name: epo-ron
data_files:
- split: test
path: epo-ron/test-*
- split: validation
path: epo-ron/validation-*
- config_name: epo-rus
data_files:
- split: test
path: epo-rus/test-*
- split: validation
path: epo-rus/validation-*
- config_name: epo-slv
data_files:
- split: test
path: epo-slv/test-*
- config_name: epo-spa
data_files:
- split: test
path: epo-spa/test-*
- split: validation
path: epo-spa/validation-*
- config_name: epo-srp_Cyrl
data_files:
- split: test
path: epo-srp_Cyrl/test-*
- split: validation
path: epo-srp_Cyrl/validation-*
- config_name: epo-srp_Latn
data_files:
- split: test
path: epo-srp_Latn/test-*
- split: validation
path: epo-srp_Latn/validation-*
- config_name: epo-swe
data_files:
- split: test
path: epo-swe/test-*
- split: validation
path: epo-swe/validation-*
- config_name: epo-tgl
data_files:
- split: test
path: epo-tgl/test-*
- config_name: epo-tlh
data_files:
- split: test
path: epo-tlh/test-*
- split: validation
path: epo-tlh/validation-*
- config_name: epo-toki
data_files:
- split: test
path: epo-toki/test-*
- split: validation
path: epo-toki/validation-*
- config_name: epo-tur
data_files:
- split: test
path: epo-tur/test-*
- split: validation
path: epo-tur/validation-*
- config_name: epo-ukr
data_files:
- split: test
path: epo-ukr/test-*
- split: validation
path: epo-ukr/validation-*
- config_name: epo-vie
data_files:
- split: test
path: epo-vie/test-*
- config_name: epo-vol
data_files:
- split: test
path: epo-vol/test-*
- split: validation
path: epo-vol/validation-*
- config_name: epo-yid
data_files:
- split: test
path: epo-yid/test-*
- split: validation
path: epo-yid/validation-*
- config_name: epo-zho
data_files:
- split: test
path: epo-zho/test-*
- split: validation
path: epo-zho/validation-*
- config_name: est-rus
data_files:
- split: test
path: est-rus/test-*
- config_name: eus-jpn
data_files:
- split: test
path: eus-jpn/test-*
- config_name: eus-rus
data_files:
- split: test
path: eus-rus/test-*
- config_name: eus-spa
data_files:
- split: test
path: eus-spa/test-*
- split: validation
path: eus-spa/validation-*
- config_name: fas-fra
data_files:
- split: test
path: fas-fra/test-*
- config_name: fin-fin
data_files:
- split: test
path: fin-fin/test-*
- split: validation
path: fin-fin/validation-*
- config_name: fin-fkv
data_files:
- split: test
path: fin-fkv/test-*
- split: validation
path: fin-fkv/validation-*
- config_name: fin-fra
data_files:
- split: test
path: fin-fra/test-*
- split: validation
path: fin-fra/validation-*
- config_name: fin-heb
data_files:
- split: test
path: fin-heb/test-*
- config_name: fin-hun
data_files:
- split: test
path: fin-hun/test-*
- config_name: fin-ita
data_files:
- split: test
path: fin-ita/test-*
- split: validation
path: fin-ita/validation-*
- config_name: fin-jpn
data_files:
- split: test
path: fin-jpn/test-*
- split: validation
path: fin-jpn/validation-*
- config_name: fin-jpn_Hani
data_files:
- split: test
path: fin-jpn_Hani/test-*
- split: validation
path: fin-jpn_Hani/validation-*
- config_name: fin-jpn_Hira
data_files:
- split: test
path: fin-jpn_Hira/test-*
- split: validation
path: fin-jpn_Hira/validation-*
- config_name: fin-jpn_Kana
data_files:
- split: test
path: fin-jpn_Kana/test-*
- split: validation
path: fin-jpn_Kana/validation-*
- config_name: fin-kor
data_files:
- split: test
path: fin-kor/test-*
- config_name: fin-kor_Hang
data_files:
- split: test
path: fin-kor_Hang/test-*
- config_name: fin-kur
data_files:
- split: test
path: fin-kur/test-*
- config_name: fin-lat
data_files:
- split: test
path: fin-lat/test-*
- config_name: fin-nld
data_files:
- split: test
path: fin-nld/test-*
- config_name: fin-nno
data_files:
- split: test
path: fin-nno/test-*
- split: validation
path: fin-nno/validation-*
- config_name: fin-nob
data_files:
- split: test
path: fin-nob/test-*
- split: validation
path: fin-nob/validation-*
- config_name: fin-nor
data_files:
- split: test
path: fin-nor/test-*
- split: validation
path: fin-nor/validation-*
- config_name: fin-pol
data_files:
- split: test
path: fin-pol/test-*
- config_name: fin-por
data_files:
- split: test
path: fin-por/test-*
- config_name: fin-rus
data_files:
- split: test
path: fin-rus/test-*
- split: validation
path: fin-rus/validation-*
- config_name: fin-spa
data_files:
- split: test
path: fin-spa/test-*
- split: validation
path: fin-spa/validation-*
- config_name: fin-swe
data_files:
- split: test
path: fin-swe/test-*
- split: validation
path: fin-swe/validation-*
- config_name: fin-tur
data_files:
- split: test
path: fin-tur/test-*
- config_name: fin-zho
data_files:
- split: test
path: fin-zho/test-*
- config_name: fra-cmn_Hans
data_files:
- split: test
path: fra-cmn_Hans/test-*
- split: validation
path: fra-cmn_Hans/validation-*
- config_name: fra-cmn_Hant
data_files:
- split: test
path: fra-cmn_Hant/test-*
- split: validation
path: fra-cmn_Hant/validation-*
- config_name: fra-fra
data_files:
- split: test
path: fra-fra/test-*
- split: validation
path: fra-fra/validation-*
- config_name: fra-gcf
data_files:
- split: test
path: fra-gcf/test-*
- config_name: fra-hbs
data_files:
- split: test
path: fra-hbs/test-*
- config_name: fra-heb
data_files:
- split: test
path: fra-heb/test-*
- split: validation
path: fra-heb/validation-*
- config_name: fra-hrv
data_files:
- split: test
path: fra-hrv/test-*
- config_name: fra-hun
data_files:
- split: test
path: fra-hun/test-*
- split: validation
path: fra-hun/validation-*
- config_name: fra-ido
data_files:
- split: test
path: fra-ido/test-*
- split: validation
path: fra-ido/validation-*
- config_name: fra-ile
data_files:
- split: test
path: fra-ile/test-*
- config_name: fra-ina
data_files:
- split: test
path: fra-ina/test-*
- split: validation
path: fra-ina/validation-*
- config_name: fra-ind
data_files:
- split: test
path: fra-ind/test-*
- config_name: fra-ita
data_files:
- split: test
path: fra-ita/test-*
- split: validation
path: fra-ita/validation-*
- config_name: fra-jbo
data_files:
- split: test
path: fra-jbo/test-*
- config_name: fra-jpn
data_files:
- split: test
path: fra-jpn/test-*
- split: validation
path: fra-jpn/validation-*
- config_name: fra-jpn_Hani
data_files:
- split: test
path: fra-jpn_Hani/test-*
- split: validation
path: fra-jpn_Hani/validation-*
- config_name: fra-jpn_Hira
data_files:
- split: test
path: fra-jpn_Hira/test-*
- split: validation
path: fra-jpn_Hira/validation-*
- config_name: fra-kab
data_files:
- split: test
path: fra-kab/test-*
- split: validation
path: fra-kab/validation-*
- config_name: fra-kor
data_files:
- split: test
path: fra-kor/test-*
- config_name: fra-kor_Hang
data_files:
- split: test
path: fra-kor_Hang/test-*
- config_name: fra-lat
data_files:
- split: test
path: fra-lat/test-*
- split: validation
path: fra-lat/validation-*
- config_name: fra-lfn
data_files:
- split: test
path: fra-lfn/test-*
- split: validation
path: fra-lfn/validation-*
- config_name: fra-lfn_Latn
data_files:
- split: test
path: fra-lfn_Latn/test-*
- split: validation
path: fra-lfn_Latn/validation-*
- config_name: fra-msa
data_files:
- split: test
path: fra-msa/test-*
- config_name: fra-nds
data_files:
- split: test
path: fra-nds/test-*
- config_name: fra-nld
data_files:
- split: test
path: fra-nld/test-*
- split: validation
path: fra-nld/validation-*
- config_name: fra-nob
data_files:
- split: test
path: fra-nob/test-*
- config_name: fra-nor
data_files:
- split: test
path: fra-nor/test-*
- config_name: fra-oci
data_files:
- split: test
path: fra-oci/test-*
- config_name: fra-pcd
data_files:
- split: test
path: fra-pcd/test-*
- config_name: fra-pol
data_files:
- split: test
path: fra-pol/test-*
- split: validation
path: fra-pol/validation-*
- config_name: fra-por
data_files:
- split: test
path: fra-por/test-*
- split: validation
path: fra-por/validation-*
- config_name: fra-ron
data_files:
- split: test
path: fra-ron/test-*
- config_name: fra-run
data_files:
- split: test
path: fra-run/test-*
- config_name: fra-rus
data_files:
- split: test
path: fra-rus/test-*
- split: validation
path: fra-rus/validation-*
- config_name: fra-slv
data_files:
- split: test
path: fra-slv/test-*
- config_name: fra-spa
data_files:
- split: test
path: fra-spa/test-*
- split: validation
path: fra-spa/validation-*
- config_name: fra-swe
data_files:
- split: test
path: fra-swe/test-*
- split: validation
path: fra-swe/validation-*
- config_name: fra-tat
data_files:
- split: test
path: fra-tat/test-*
- config_name: fra-tgl
data_files:
- split: test
path: fra-tgl/test-*
- config_name: fra-tlh
data_files:
- split: test
path: fra-tlh/test-*
- config_name: fra-tlh_Latn
data_files:
- split: test
path: fra-tlh_Latn/test-*
- config_name: fra-toki
data_files:
- split: test
path: fra-toki/test-*
- split: validation
path: fra-toki/validation-*
- config_name: fra-toki_Latn
data_files:
- split: test
path: fra-toki_Latn/test-*
- config_name: fra-tur
data_files:
- split: test
path: fra-tur/test-*
- split: validation
path: fra-tur/validation-*
- config_name: fra-uig
data_files:
- split: test
path: fra-uig/test-*
- config_name: fra-uig_Arab
data_files:
- split: test
path: fra-uig_Arab/test-*
- config_name: fra-ukr
data_files:
- split: test
path: fra-ukr/test-*
- split: validation
path: fra-ukr/validation-*
- config_name: fra-vie
data_files:
- split: test
path: fra-vie/test-*
- config_name: fra-wuu
data_files:
- split: test
path: fra-wuu/test-*
- split: validation
path: fra-wuu/validation-*
- config_name: fra-yid
data_files:
- split: test
path: fra-yid/test-*
- split: validation
path: fra-yid/validation-*
- config_name: fra-zho
data_files:
- split: test
path: fra-zho/test-*
- split: validation
path: fra-zho/validation-*
- config_name: fry-nld
data_files:
- split: test
path: fry-nld/test-*
- config_name: gcf-gcf
data_files:
- split: test
path: gcf-gcf/test-*
- config_name: gla-spa
data_files:
- split: test
path: gla-spa/test-*
- config_name: glg-por
data_files:
- split: test
path: glg-por/test-*
- config_name: glg-spa
data_files:
- split: test
path: glg-spa/test-*
- split: validation
path: glg-spa/validation-*
- config_name: gos-nld
data_files:
- split: test
path: gos-nld/test-*
- split: validation
path: gos-nld/validation-*
- config_name: grn-por
data_files:
- split: test
path: grn-por/test-*
- split: validation
path: grn-por/validation-*
- config_name: grn-spa
data_files:
- split: test
path: grn-spa/test-*
- split: validation
path: grn-spa/validation-*
- config_name: hbs-ita
data_files:
- split: test
path: hbs-ita/test-*
- config_name: hbs-jpn
data_files:
- split: test
path: hbs-jpn/test-*
- config_name: hbs-nor
data_files:
- split: test
path: hbs-nor/test-*
- split: validation
path: hbs-nor/validation-*
- config_name: hbs-pol
data_files:
- split: test
path: hbs-pol/test-*
- config_name: hbs-rus
data_files:
- split: test
path: hbs-rus/test-*
- split: validation
path: hbs-rus/validation-*
- config_name: hbs-spa
data_files:
- split: test
path: hbs-spa/test-*
- config_name: hbs-ukr
data_files:
- split: test
path: hbs-ukr/test-*
- config_name: hbs-zho
data_files:
- split: test
path: hbs-zho/test-*
- config_name: heb-cmn_Hans
data_files:
- split: test
path: heb-cmn_Hans/test-*
- config_name: heb-cmn_Hant
data_files:
- split: test
path: heb-cmn_Hant/test-*
- config_name: heb-heb
data_files:
- split: test
path: heb-heb/test-*
- split: validation
path: heb-heb/validation-*
- config_name: heb-hun
data_files:
- split: test
path: heb-hun/test-*
- config_name: heb-ina
data_files:
- split: test
path: heb-ina/test-*
- split: validation
path: heb-ina/validation-*
- config_name: heb-ita
data_files:
- split: test
path: heb-ita/test-*
- split: validation
path: heb-ita/validation-*
- config_name: heb-jpn
data_files:
- split: test
path: heb-jpn/test-*
- config_name: heb-jpn_Hira
data_files:
- split: test
path: heb-jpn_Hira/test-*
- config_name: heb-lad
data_files:
- split: test
path: heb-lad/test-*
- split: validation
path: heb-lad/validation-*
- config_name: heb-lat
data_files:
- split: test
path: heb-lat/test-*
- split: validation
path: heb-lat/validation-*
- config_name: heb-lfn
data_files:
- split: test
path: heb-lfn/test-*
- split: validation
path: heb-lfn/validation-*
- config_name: heb-lfn_Latn
data_files:
- split: test
path: heb-lfn_Latn/test-*
- split: validation
path: heb-lfn_Latn/validation-*
- config_name: heb-nld
data_files:
- split: test
path: heb-nld/test-*
- split: validation
path: heb-nld/validation-*
- config_name: heb-pol
data_files:
- split: test
path: heb-pol/test-*
- split: validation
path: heb-pol/validation-*
- config_name: heb-por
data_files:
- split: test
path: heb-por/test-*
- split: validation
path: heb-por/validation-*
- config_name: heb-rus
data_files:
- split: test
path: heb-rus/test-*
- split: validation
path: heb-rus/validation-*
- config_name: heb-spa
data_files:
- split: test
path: heb-spa/test-*
- split: validation
path: heb-spa/validation-*
- config_name: heb-tur
data_files:
- split: test
path: heb-tur/test-*
- split: validation
path: heb-tur/validation-*
- config_name: heb-ukr
data_files:
- split: test
path: heb-ukr/test-*
- config_name: heb-yid
data_files:
- split: test
path: heb-yid/test-*
- split: validation
path: heb-yid/validation-*
- config_name: heb-zho
data_files:
- split: test
path: heb-zho/test-*
- config_name: hin-urd
data_files:
- split: test
path: hin-urd/test-*
- config_name: hin-zho
data_files:
- split: test
path: hin-zho/test-*
- config_name: hrv-jpn_Hira
data_files:
- split: test
path: hrv-jpn_Hira/test-*
- config_name: hrv-pol
data_files:
- split: test
path: hrv-pol/test-*
- config_name: hrv-spa
data_files:
- split: test
path: hrv-spa/test-*
- config_name: hrv-ukr
data_files:
- split: test
path: hrv-ukr/test-*
- config_name: hsb-slv
data_files:
- split: test
path: hsb-slv/test-*
- config_name: hun-cmn_Hans
data_files:
- split: test
path: hun-cmn_Hans/test-*
- config_name: hun-hun
data_files:
- split: test
path: hun-hun/test-*
- config_name: hun-ita
data_files:
- split: test
path: hun-ita/test-*
- split: validation
path: hun-ita/validation-*
- config_name: hun-jpn
data_files:
- split: test
path: hun-jpn/test-*
- split: validation
path: hun-jpn/validation-*
- config_name: hun-jpn_Hani
data_files:
- split: test
path: hun-jpn_Hani/test-*
- split: validation
path: hun-jpn_Hani/validation-*
- config_name: hun-jpn_Hira
data_files:
- split: test
path: hun-jpn_Hira/test-*
- split: validation
path: hun-jpn_Hira/validation-*
- config_name: hun-kor
data_files:
- split: test
path: hun-kor/test-*
- config_name: hun-kor_Hang
data_files:
- split: test
path: hun-kor_Hang/test-*
- config_name: hun-lat
data_files:
- split: test
path: hun-lat/test-*
- config_name: hun-nld
data_files:
- split: test
path: hun-nld/test-*
- split: validation
path: hun-nld/validation-*
- config_name: hun-pol
data_files:
- split: test
path: hun-pol/test-*
- config_name: hun-por
data_files:
- split: test
path: hun-por/test-*
- split: validation
path: hun-por/validation-*
- config_name: hun-rus
data_files:
- split: test
path: hun-rus/test-*
- split: validation
path: hun-rus/validation-*
- config_name: hun-spa
data_files:
- split: test
path: hun-spa/test-*
- split: validation
path: hun-spa/validation-*
- config_name: hun-swe
data_files:
- split: test
path: hun-swe/test-*
- split: validation
path: hun-swe/validation-*
- config_name: hun-tur
data_files:
- split: test
path: hun-tur/test-*
- config_name: hun-ukr
data_files:
- split: test
path: hun-ukr/test-*
- config_name: hun-zho
data_files:
- split: test
path: hun-zho/test-*
- config_name: hye-rus
data_files:
- split: test
path: hye-rus/test-*
- config_name: ido-ina
data_files:
- split: test
path: ido-ina/test-*
- split: validation
path: ido-ina/validation-*
- config_name: ido-ita
data_files:
- split: test
path: ido-ita/test-*
- split: validation
path: ido-ita/validation-*
- config_name: ido-lfn
data_files:
- split: test
path: ido-lfn/test-*
- split: validation
path: ido-lfn/validation-*
- config_name: ido-spa
data_files:
- split: test
path: ido-spa/test-*
- config_name: ido-yid
data_files:
- split: test
path: ido-yid/test-*
- split: validation
path: ido-yid/validation-*
- config_name: ido_Latn-lfn_Latn
data_files:
- split: test
path: ido_Latn-lfn_Latn/test-*
- split: validation
path: ido_Latn-lfn_Latn/validation-*
- config_name: ina-ita
data_files:
- split: test
path: ina-ita/test-*
- config_name: ina-lad
data_files:
- split: test
path: ina-lad/test-*
- split: validation
path: ina-lad/validation-*
- config_name: ina-lat
data_files:
- split: test
path: ina-lat/test-*
- split: validation
path: ina-lat/validation-*
- config_name: ina-lfn
data_files:
- split: test
path: ina-lfn/test-*
- split: validation
path: ina-lfn/validation-*
- config_name: ina-nld
data_files:
- split: test
path: ina-nld/test-*
- config_name: ina-por
data_files:
- split: test
path: ina-por/test-*
- split: validation
path: ina-por/validation-*
- config_name: ina-rus
data_files:
- split: test
path: ina-rus/test-*
- split: validation
path: ina-rus/validation-*
- config_name: ina-spa
data_files:
- split: test
path: ina-spa/test-*
- split: validation
path: ina-spa/validation-*
- config_name: ina-tlh
data_files:
- split: test
path: ina-tlh/test-*
- split: validation
path: ina-tlh/validation-*
- config_name: ina-tur
data_files:
- split: test
path: ina-tur/test-*
- config_name: ina-yid
data_files:
- split: test
path: ina-yid/test-*
- split: validation
path: ina-yid/validation-*
- config_name: ina_Latn-lad_Latn
data_files:
- split: test
path: ina_Latn-lad_Latn/test-*
- split: validation
path: ina_Latn-lad_Latn/validation-*
- config_name: ina_Latn-lfn_Latn
data_files:
- split: test
path: ina_Latn-lfn_Latn/test-*
- split: validation
path: ina_Latn-lfn_Latn/validation-*
- config_name: ina_Latn-tlh_Latn
data_files:
- split: test
path: ina_Latn-tlh_Latn/test-*
- config_name: ind-zsm_Latn
data_files:
- split: test
path: ind-zsm_Latn/test-*
- config_name: isl-ita
data_files:
- split: test
path: isl-ita/test-*
- config_name: isl-jpn
data_files:
- split: test
path: isl-jpn/test-*
- config_name: isl-jpn_Hira
data_files:
- split: test
path: isl-jpn_Hira/test-*
- config_name: isl-spa
data_files:
- split: test
path: isl-spa/test-*
- config_name: ita-cmn_Hans
data_files:
- split: test
path: ita-cmn_Hans/test-*
- split: validation
path: ita-cmn_Hans/validation-*
- config_name: ita-cmn_Hant
data_files:
- split: test
path: ita-cmn_Hant/test-*
- split: validation
path: ita-cmn_Hant/validation-*
- config_name: ita-ind
data_files:
- split: test
path: ita-ind/test-*
- config_name: ita-ita
data_files:
- split: test
path: ita-ita/test-*
- split: validation
path: ita-ita/validation-*
- config_name: ita-jpn
data_files:
- split: test
path: ita-jpn/test-*
- split: validation
path: ita-jpn/validation-*
- config_name: ita-jpn_Hani
data_files:
- split: test
path: ita-jpn_Hani/test-*
- split: validation
path: ita-jpn_Hani/validation-*
- config_name: ita-jpn_Hira
data_files:
- split: test
path: ita-jpn_Hira/test-*
- split: validation
path: ita-jpn_Hira/validation-*
- config_name: ita-lat
data_files:
- split: test
path: ita-lat/test-*
- split: validation
path: ita-lat/validation-*
- config_name: ita-lit
data_files:
- split: test
path: ita-lit/test-*
- config_name: ita-msa
data_files:
- split: test
path: ita-msa/test-*
- config_name: ita-nds
data_files:
- split: test
path: ita-nds/test-*
- config_name: ita-nld
data_files:
- split: test
path: ita-nld/test-*
- split: validation
path: ita-nld/validation-*
- config_name: ita-nor
data_files:
- split: test
path: ita-nor/test-*
- config_name: ita-pms
data_files:
- split: test
path: ita-pms/test-*
- config_name: ita-pol
data_files:
- split: test
path: ita-pol/test-*
- split: validation
path: ita-pol/validation-*
- config_name: ita-por
data_files:
- split: test
path: ita-por/test-*
- split: validation
path: ita-por/validation-*
- config_name: ita-ron
data_files:
- split: test
path: ita-ron/test-*
- config_name: ita-rus
data_files:
- split: test
path: ita-rus/test-*
- split: validation
path: ita-rus/validation-*
- config_name: ita-spa
data_files:
- split: test
path: ita-spa/test-*
- split: validation
path: ita-spa/validation-*
- config_name: ita-swe
data_files:
- split: test
path: ita-swe/test-*
- config_name: ita-toki
data_files:
- split: test
path: ita-toki/test-*
- config_name: ita-tur
data_files:
- split: test
path: ita-tur/test-*
- split: validation
path: ita-tur/validation-*
- config_name: ita-ukr
data_files:
- split: test
path: ita-ukr/test-*
- split: validation
path: ita-ukr/validation-*
- config_name: ita-vie
data_files:
- split: test
path: ita-vie/test-*
- config_name: ita-yid
data_files:
- split: test
path: ita-yid/test-*
- split: validation
path: ita-yid/validation-*
- config_name: ita-zho
data_files:
- split: test
path: ita-zho/test-*
- split: validation
path: ita-zho/validation-*
- config_name: jbo-jpn
data_files:
- split: test
path: jbo-jpn/test-*
- config_name: jbo-rus
data_files:
- split: test
path: jbo-rus/test-*
- config_name: jbo-spa
data_files:
- split: test
path: jbo-spa/test-*
- config_name: jbo-swe
data_files:
- split: test
path: jbo-swe/test-*
- config_name: jbo-zho
data_files:
- split: test
path: jbo-zho/test-*
- split: validation
path: jbo-zho/validation-*
- config_name: jbo_Latn-cmn_Hans
data_files:
- split: test
path: jbo_Latn-cmn_Hans/test-*
- config_name: jbo_Latn-cmn_Hant
data_files:
- split: test
path: jbo_Latn-cmn_Hant/test-*
- split: validation
path: jbo_Latn-cmn_Hant/validation-*
- config_name: jbo_Latn-jpn_Hira
data_files:
- split: test
path: jbo_Latn-jpn_Hira/test-*
- config_name: jpn-jpn
data_files:
- split: test
path: jpn-jpn/test-*
- config_name: jpn-kor
data_files:
- split: test
path: jpn-kor/test-*
- config_name: jpn-lit
data_files:
- split: test
path: jpn-lit/test-*
- config_name: jpn-mar
data_files:
- split: test
path: jpn-mar/test-*
- config_name: jpn-msa
data_files:
- split: test
path: jpn-msa/test-*
- split: validation
path: jpn-msa/validation-*
- config_name: jpn-nds
data_files:
- split: test
path: jpn-nds/test-*
- split: validation
path: jpn-nds/validation-*
- config_name: jpn-nld
data_files:
- split: test
path: jpn-nld/test-*
- split: validation
path: jpn-nld/validation-*
- config_name: jpn-nor
data_files:
- split: test
path: jpn-nor/test-*
- config_name: jpn-pol
data_files:
- split: test
path: jpn-pol/test-*
- split: validation
path: jpn-pol/validation-*
- config_name: jpn-por
data_files:
- split: test
path: jpn-por/test-*
- split: validation
path: jpn-por/validation-*
- config_name: jpn-rus
data_files:
- split: test
path: jpn-rus/test-*
- split: validation
path: jpn-rus/validation-*
- config_name: jpn-spa
data_files:
- split: test
path: jpn-spa/test-*
- split: validation
path: jpn-spa/validation-*
- config_name: jpn-swe
data_files:
- split: test
path: jpn-swe/test-*
- config_name: jpn-tlh
data_files:
- split: test
path: jpn-tlh/test-*
- config_name: jpn-toki
data_files:
- split: test
path: jpn-toki/test-*
- config_name: jpn-tur
data_files:
- split: test
path: jpn-tur/test-*
- config_name: jpn-ukr
data_files:
- split: test
path: jpn-ukr/test-*
- split: validation
path: jpn-ukr/validation-*
- config_name: jpn-vie
data_files:
- split: test
path: jpn-vie/test-*
- split: validation
path: jpn-vie/validation-*
- config_name: jpn-zho
data_files:
- split: test
path: jpn-zho/test-*
- split: validation
path: jpn-zho/validation-*
- config_name: jpn_Hani-cmn_Hans
data_files:
- split: test
path: jpn_Hani-cmn_Hans/test-*
- split: validation
path: jpn_Hani-cmn_Hans/validation-*
- config_name: jpn_Hani-nld
data_files:
- split: test
path: jpn_Hani-nld/test-*
- split: validation
path: jpn_Hani-nld/validation-*
- config_name: jpn_Hani-pol
data_files:
- split: test
path: jpn_Hani-pol/test-*
- split: validation
path: jpn_Hani-pol/validation-*
- config_name: jpn_Hani-por
data_files:
- split: test
path: jpn_Hani-por/test-*
- split: validation
path: jpn_Hani-por/validation-*
- config_name: jpn_Hani-rus
data_files:
- split: test
path: jpn_Hani-rus/test-*
- split: validation
path: jpn_Hani-rus/validation-*
- config_name: jpn_Hani-spa
data_files:
- split: test
path: jpn_Hani-spa/test-*
- split: validation
path: jpn_Hani-spa/validation-*
- config_name: jpn_Hira-cmn_Hans
data_files:
- split: test
path: jpn_Hira-cmn_Hans/test-*
- split: validation
path: jpn_Hira-cmn_Hans/validation-*
- config_name: jpn_Hira-cmn_Hant
data_files:
- split: test
path: jpn_Hira-cmn_Hant/test-*
- split: validation
path: jpn_Hira-cmn_Hant/validation-*
- config_name: jpn_Hira-ind
data_files:
- split: test
path: jpn_Hira-ind/test-*
- split: validation
path: jpn_Hira-ind/validation-*
- config_name: jpn_Hira-jpn_Hira
data_files:
- split: test
path: jpn_Hira-jpn_Hira/test-*
- config_name: jpn_Hira-kor_Hang
data_files:
- split: test
path: jpn_Hira-kor_Hang/test-*
- config_name: jpn_Hira-lit
data_files:
- split: test
path: jpn_Hira-lit/test-*
- config_name: jpn_Hira-mar
data_files:
- split: test
path: jpn_Hira-mar/test-*
- config_name: jpn_Hira-nds
data_files:
- split: test
path: jpn_Hira-nds/test-*
- split: validation
path: jpn_Hira-nds/validation-*
- config_name: jpn_Hira-nld
data_files:
- split: test
path: jpn_Hira-nld/test-*
- split: validation
path: jpn_Hira-nld/validation-*
- config_name: jpn_Hira-nob
data_files:
- split: test
path: jpn_Hira-nob/test-*
- config_name: jpn_Hira-pol
data_files:
- split: test
path: jpn_Hira-pol/test-*
- split: validation
path: jpn_Hira-pol/validation-*
- config_name: jpn_Hira-por
data_files:
- split: test
path: jpn_Hira-por/test-*
- split: validation
path: jpn_Hira-por/validation-*
- config_name: jpn_Hira-rus
data_files:
- split: test
path: jpn_Hira-rus/test-*
- split: validation
path: jpn_Hira-rus/validation-*
- config_name: jpn_Hira-spa
data_files:
- split: test
path: jpn_Hira-spa/test-*
- split: validation
path: jpn_Hira-spa/validation-*
- config_name: jpn_Hira-swe
data_files:
- split: test
path: jpn_Hira-swe/test-*
- config_name: jpn_Hira-tlh_Latn
data_files:
- split: test
path: jpn_Hira-tlh_Latn/test-*
- config_name: jpn_Hira-tur
data_files:
- split: test
path: jpn_Hira-tur/test-*
- config_name: jpn_Hira-ukr
data_files:
- split: test
path: jpn_Hira-ukr/test-*
- config_name: jpn_Hira-vie
data_files:
- split: test
path: jpn_Hira-vie/test-*
- split: validation
path: jpn_Hira-vie/validation-*
- config_name: jpn_Kana-rus
data_files:
- split: test
path: jpn_Kana-rus/test-*
- split: validation
path: jpn_Kana-rus/validation-*
- config_name: jpn_Kana-spa
data_files:
- split: test
path: jpn_Kana-spa/test-*
- split: validation
path: jpn_Kana-spa/validation-*
- config_name: kab-kab
data_files:
- split: test
path: kab-kab/test-*
- split: validation
path: kab-kab/validation-*
- config_name: kab-rus
data_files:
- split: test
path: kab-rus/test-*
- config_name: kab-spa
data_files:
- split: test
path: kab-spa/test-*
- config_name: kat-rus
data_files:
- split: test
path: kat-rus/test-*
- config_name: kaz-rus
data_files:
- split: test
path: kaz-rus/test-*
- split: validation
path: kaz-rus/validation-*
- config_name: kaz_Cyrl-rus
data_files:
- split: test
path: kaz_Cyrl-rus/test-*
- config_name: khm-spa
data_files:
- split: test
path: khm-spa/test-*
- config_name: kor-rus
data_files:
- split: test
path: kor-rus/test-*
- config_name: kor-spa
data_files:
- split: test
path: kor-spa/test-*
- config_name: kor-zho
data_files:
- split: test
path: kor-zho/test-*
- config_name: kor_Hang-cmn_Hans
data_files:
- split: test
path: kor_Hang-cmn_Hans/test-*
- config_name: kor_Hang-rus
data_files:
- split: test
path: kor_Hang-rus/test-*
- config_name: kor_Hang-spa
data_files:
- split: test
path: kor_Hang-spa/test-*
- config_name: kzj-msa
data_files:
- split: test
path: kzj-msa/test-*
- config_name: kzj_Latn-zsm_Latn
data_files:
- split: test
path: kzj_Latn-zsm_Latn/test-*
- config_name: lad-lat
data_files:
- split: test
path: lad-lat/test-*
- split: validation
path: lad-lat/validation-*
- config_name: lad-lfn
data_files:
- split: test
path: lad-lfn/test-*
- split: validation
path: lad-lfn/validation-*
- config_name: lad-spa
data_files:
- split: test
path: lad-spa/test-*
- split: validation
path: lad-spa/validation-*
- config_name: lad-yid
data_files:
- split: test
path: lad-yid/test-*
- split: validation
path: lad-yid/validation-*
- config_name: lad_Latn-lfn_Latn
data_files:
- split: test
path: lad_Latn-lfn_Latn/test-*
- split: validation
path: lad_Latn-lfn_Latn/validation-*
- config_name: lad_Latn-spa
data_files:
- split: test
path: lad_Latn-spa/test-*
- config_name: lad_Latn-yid
data_files:
- split: test
path: lad_Latn-yid/test-*
- split: validation
path: lad_Latn-yid/validation-*
- config_name: lat-lat
data_files:
- split: test
path: lat-lat/test-*
- config_name: lat-lfn
data_files:
- split: test
path: lat-lfn/test-*
- split: validation
path: lat-lfn/validation-*
- config_name: lat-nld
data_files:
- split: test
path: lat-nld/test-*
- config_name: lat-nor
data_files:
- split: test
path: lat-nor/test-*
- config_name: lat-pol
data_files:
- split: test
path: lat-pol/test-*
- config_name: lat-rus
data_files:
- split: test
path: lat-rus/test-*
- split: validation
path: lat-rus/validation-*
- config_name: lat-tlh
data_files:
- split: test
path: lat-tlh/test-*
- split: validation
path: lat-tlh/validation-*
- config_name: lat-ukr
data_files:
- split: test
path: lat-ukr/test-*
- config_name: lat-yid
data_files:
- split: test
path: lat-yid/test-*
- split: validation
path: lat-yid/validation-*
- config_name: lat_Latn-lfn_Latn
data_files:
- split: test
path: lat_Latn-lfn_Latn/test-*
- split: validation
path: lat_Latn-lfn_Latn/validation-*
- config_name: lav-rus
data_files:
- split: test
path: lav-rus/test-*
- config_name: lfn-rus
data_files:
- split: test
path: lfn-rus/test-*
- split: validation
path: lfn-rus/validation-*
- config_name: lfn-spa
data_files:
- split: test
path: lfn-spa/test-*
- split: validation
path: lfn-spa/validation-*
- config_name: lfn-yid
data_files:
- split: test
path: lfn-yid/test-*
- split: validation
path: lfn-yid/validation-*
- config_name: lfn_Cyrl-por
data_files:
- split: test
path: lfn_Cyrl-por/test-*
- split: validation
path: lfn_Cyrl-por/validation-*
- config_name: lfn_Latn-yid
data_files:
- split: test
path: lfn_Latn-yid/test-*
- split: validation
path: lfn_Latn-yid/validation-*
- config_name: lit-pol
data_files:
- split: test
path: lit-pol/test-*
- config_name: lit-rus
data_files:
- split: test
path: lit-rus/test-*
- split: validation
path: lit-rus/validation-*
- config_name: lit-spa
data_files:
- split: test
path: lit-spa/test-*
- config_name: lit-tur
data_files:
- split: test
path: lit-tur/test-*
- config_name: ltz-nld
data_files:
- split: test
path: ltz-nld/test-*
- config_name: mkd-spa
data_files:
- split: test
path: mkd-spa/test-*
- config_name: msa-msa
data_files:
- split: test
path: msa-msa/test-*
- config_name: msa-spa
data_files:
- split: test
path: msa-spa/test-*
- config_name: msa-zho
data_files:
- split: test
path: msa-zho/test-*
- config_name: nds-nld
data_files:
- split: test
path: nds-nld/test-*
- split: validation
path: nds-nld/validation-*
- config_name: nds-por
data_files:
- split: test
path: nds-por/test-*
- config_name: nds-rus
data_files:
- split: test
path: nds-rus/test-*
- split: validation
path: nds-rus/validation-*
- config_name: nds-spa
data_files:
- split: test
path: nds-spa/test-*
- config_name: nld-cmn_Hant
data_files:
- split: test
path: nld-cmn_Hant/test-*
- config_name: nld-nld
data_files:
- split: test
path: nld-nld/test-*
- split: validation
path: nld-nld/validation-*
- config_name: nld-nor
data_files:
- split: test
path: nld-nor/test-*
- config_name: nld-pol
data_files:
- split: test
path: nld-pol/test-*
- split: validation
path: nld-pol/validation-*
- config_name: nld-por
data_files:
- split: test
path: nld-por/test-*
- split: validation
path: nld-por/validation-*
- config_name: nld-ron
data_files:
- split: test
path: nld-ron/test-*
- split: validation
path: nld-ron/validation-*
- config_name: nld-rus
data_files:
- split: test
path: nld-rus/test-*
- split: validation
path: nld-rus/validation-*
- config_name: nld-spa
data_files:
- split: test
path: nld-spa/test-*
- split: validation
path: nld-spa/validation-*
- config_name: nld-toki
data_files:
- split: test
path: nld-toki/test-*
- config_name: nld-tur
data_files:
- split: test
path: nld-tur/test-*
- split: validation
path: nld-tur/validation-*
- config_name: nld-ukr
data_files:
- split: test
path: nld-ukr/test-*
- split: validation
path: nld-ukr/validation-*
- config_name: nld-zho
data_files:
- split: test
path: nld-zho/test-*
- split: validation
path: nld-zho/validation-*
- config_name: nno-nob
data_files:
- split: test
path: nno-nob/test-*
- config_name: nob-nno
data_files:
- split: test
path: nob-nno/test-*
- config_name: nob-rus
data_files:
- split: test
path: nob-rus/test-*
- config_name: nob-spa
data_files:
- split: test
path: nob-spa/test-*
- config_name: nob-swe
data_files:
- split: test
path: nob-swe/test-*
- config_name: nor-nor
data_files:
- split: test
path: nor-nor/test-*
- split: validation
path: nor-nor/validation-*
- config_name: nor-pol
data_files:
- split: test
path: nor-pol/test-*
- config_name: nor-por
data_files:
- split: test
path: nor-por/test-*
- config_name: nor-rus
data_files:
- split: test
path: nor-rus/test-*
- config_name: nor-spa
data_files:
- split: test
path: nor-spa/test-*
- config_name: nor-swe
data_files:
- split: test
path: nor-swe/test-*
- config_name: nor-ukr
data_files:
- split: test
path: nor-ukr/test-*
- config_name: nor-zho
data_files:
- split: test
path: nor-zho/test-*
- config_name: orv-ukr
data_files:
- split: test
path: orv-ukr/test-*
- config_name: ota-tur
data_files:
- split: test
path: ota-tur/test-*
- split: validation
path: ota-tur/validation-*
- config_name: pol-cmn_Hans
data_files:
- split: test
path: pol-cmn_Hans/test-*
- config_name: pol-cmn_Hant
data_files:
- split: test
path: pol-cmn_Hant/test-*
- config_name: pol-por
data_files:
- split: test
path: pol-por/test-*
- config_name: pol-rus
data_files:
- split: test
path: pol-rus/test-*
- split: validation
path: pol-rus/validation-*
- config_name: pol-spa
data_files:
- split: test
path: pol-spa/test-*
- split: validation
path: pol-spa/validation-*
- config_name: pol-swe
data_files:
- split: test
path: pol-swe/test-*
- config_name: pol-tur
data_files:
- split: test
path: pol-tur/test-*
- config_name: pol-ukr
data_files:
- split: test
path: pol-ukr/test-*
- split: validation
path: pol-ukr/validation-*
- config_name: pol-zho
data_files:
- split: test
path: pol-zho/test-*
- config_name: por-cmn_Hans
data_files:
- split: test
path: por-cmn_Hans/test-*
- config_name: por-cmn_Hant
data_files:
- split: test
path: por-cmn_Hant/test-*
- config_name: por-por
data_files:
- split: test
path: por-por/test-*
- split: validation
path: por-por/validation-*
- config_name: por-ron
data_files:
- split: test
path: por-ron/test-*
- split: validation
path: por-ron/validation-*
- config_name: por-rus
data_files:
- split: test
path: por-rus/test-*
- split: validation
path: por-rus/validation-*
- config_name: por-spa
data_files:
- split: test
path: por-spa/test-*
- split: validation
path: por-spa/validation-*
- config_name: por-swe
data_files:
- split: test
path: por-swe/test-*
- config_name: por-tgl
data_files:
- split: test
path: por-tgl/test-*
- config_name: por-toki
data_files:
- split: test
path: por-toki/test-*
- split: validation
path: por-toki/validation-*
- config_name: por-tur
data_files:
- split: test
path: por-tur/test-*
- config_name: por-ukr
data_files:
- split: test
path: por-ukr/test-*
- split: validation
path: por-ukr/validation-*
- config_name: por-zho
data_files:
- split: test
path: por-zho/test-*
- config_name: ron-rus
data_files:
- split: test
path: ron-rus/test-*
- config_name: ron-spa
data_files:
- split: test
path: ron-spa/test-*
- config_name: ron-tur
data_files:
- split: test
path: ron-tur/test-*
- split: validation
path: ron-tur/validation-*
- config_name: run-rus
data_files:
- split: test
path: run-rus/test-*
- config_name: run-spa
data_files:
- split: test
path: run-spa/test-*
- config_name: rus-cmn_Hans
data_files:
- split: test
path: rus-cmn_Hans/test-*
- split: validation
path: rus-cmn_Hans/validation-*
- config_name: rus-cmn_Hant
data_files:
- split: test
path: rus-cmn_Hant/test-*
- split: validation
path: rus-cmn_Hant/validation-*
- config_name: rus-rus
data_files:
- split: test
path: rus-rus/test-*
- split: validation
path: rus-rus/validation-*
- config_name: rus-sah
data_files:
- split: test
path: rus-sah/test-*
- config_name: rus-slv
data_files:
- split: test
path: rus-slv/test-*
- split: validation
path: rus-slv/validation-*
- config_name: rus-spa
data_files:
- split: test
path: rus-spa/test-*
- split: validation
path: rus-spa/validation-*
- config_name: rus-swe
data_files:
- split: test
path: rus-swe/test-*
- split: validation
path: rus-swe/validation-*
- config_name: rus-tat
data_files:
- split: test
path: rus-tat/test-*
- split: validation
path: rus-tat/validation-*
- config_name: rus-tlh
data_files:
- split: test
path: rus-tlh/test-*
- config_name: rus-toki
data_files:
- split: test
path: rus-toki/test-*
- split: validation
path: rus-toki/validation-*
- config_name: rus-toki_Latn
data_files:
- split: test
path: rus-toki_Latn/test-*
- config_name: rus-tur
data_files:
- split: test
path: rus-tur/test-*
- split: validation
path: rus-tur/validation-*
- config_name: rus-uig
data_files:
- split: test
path: rus-uig/test-*
- config_name: rus-uig_Arab
data_files:
- split: test
path: rus-uig_Arab/test-*
- config_name: rus-ukr
data_files:
- split: test
path: rus-ukr/test-*
- split: validation
path: rus-ukr/validation-*
- config_name: rus-vie
data_files:
- split: test
path: rus-vie/test-*
- config_name: rus-xal
data_files:
- split: test
path: rus-xal/test-*
- config_name: rus-yue_Hans
data_files:
- split: test
path: rus-yue_Hans/test-*
- split: validation
path: rus-yue_Hans/validation-*
- config_name: rus-zho
data_files:
- split: test
path: rus-zho/test-*
- split: validation
path: rus-zho/validation-*
- config_name: slv-cmn_Hans
data_files:
- split: test
path: slv-cmn_Hans/test-*
- config_name: slv-ukr
data_files:
- split: test
path: slv-ukr/test-*
- split: validation
path: slv-ukr/validation-*
- config_name: slv-zho
data_files:
- split: test
path: slv-zho/test-*
- config_name: spa-cmn_Hans
data_files:
- split: test
path: spa-cmn_Hans/test-*
- split: validation
path: spa-cmn_Hans/validation-*
- config_name: spa-cmn_Hant
data_files:
- split: test
path: spa-cmn_Hant/test-*
- split: validation
path: spa-cmn_Hant/validation-*
- config_name: spa-spa
data_files:
- split: test
path: spa-spa/test-*
- split: validation
path: spa-spa/validation-*
- config_name: spa-swe
data_files:
- split: test
path: spa-swe/test-*
- split: validation
path: spa-swe/validation-*
- config_name: spa-tat
data_files:
- split: test
path: spa-tat/test-*
- config_name: spa-tgl
data_files:
- split: test
path: spa-tgl/test-*
- config_name: spa-tlh
data_files:
- split: test
path: spa-tlh/test-*
- config_name: spa-toki
data_files:
- split: test
path: spa-toki/test-*
- split: validation
path: spa-toki/validation-*
- config_name: spa-tur
data_files:
- split: test
path: spa-tur/test-*
- split: validation
path: spa-tur/validation-*
- config_name: spa-ukr
data_files:
- split: test
path: spa-ukr/test-*
- split: validation
path: spa-ukr/validation-*
- config_name: spa-vie
data_files:
- split: test
path: spa-vie/test-*
- config_name: spa-yid
data_files:
- split: test
path: spa-yid/test-*
- split: validation
path: spa-yid/validation-*
- config_name: spa-zho
data_files:
- split: test
path: spa-zho/test-*
- split: validation
path: spa-zho/validation-*
- config_name: srp_Cyrl-rus
data_files:
- split: test
path: srp_Cyrl-rus/test-*
- split: validation
path: srp_Cyrl-rus/validation-*
- config_name: srp_Cyrl-ukr
data_files:
- split: test
path: srp_Cyrl-ukr/test-*
- config_name: srp_Latn-ita
data_files:
- split: test
path: srp_Latn-ita/test-*
- config_name: srp_Latn-nob
data_files:
- split: test
path: srp_Latn-nob/test-*
- split: validation
path: srp_Latn-nob/validation-*
- config_name: srp_Latn-rus
data_files:
- split: test
path: srp_Latn-rus/test-*
- split: validation
path: srp_Latn-rus/validation-*
- config_name: srp_Latn-ukr
data_files:
- split: test
path: srp_Latn-ukr/test-*
- config_name: swe-cmn_Hans
data_files:
- split: test
path: swe-cmn_Hans/test-*
- config_name: swe-cmn_Hant
data_files:
- split: test
path: swe-cmn_Hant/test-*
- config_name: swe-swe
data_files:
- split: test
path: swe-swe/test-*
- config_name: swe-tur
data_files:
- split: test
path: swe-tur/test-*
- config_name: swe-zho
data_files:
- split: test
path: swe-zho/test-*
- config_name: tat-tur
data_files:
- split: test
path: tat-tur/test-*
- config_name: tat-vie
data_files:
- split: test
path: tat-vie/test-*
- config_name: tlh-yid
data_files:
- split: test
path: tlh-yid/test-*
- split: validation
path: tlh-yid/validation-*
- config_name: tlh-zho
data_files:
- split: test
path: tlh-zho/test-*
- config_name: tlh_Latn-cmn_Hans
data_files:
- split: test
path: tlh_Latn-cmn_Hans/test-*
- config_name: tlh_Latn-cmn_Hant
data_files:
- split: test
path: tlh_Latn-cmn_Hant/test-*
- config_name: tlh_Latn-yid
data_files:
- split: test
path: tlh_Latn-yid/test-*
- config_name: tur-cmn_Hans
data_files:
- split: test
path: tur-cmn_Hans/test-*
- config_name: tur-cmn_Hant
data_files:
- split: test
path: tur-cmn_Hant/test-*
- config_name: tur-tur
data_files:
- split: test
path: tur-tur/test-*
- split: validation
path: tur-tur/validation-*
- config_name: tur-uig
data_files:
- split: test
path: tur-uig/test-*
- split: validation
path: tur-uig/validation-*
- config_name: tur-ukr
data_files:
- split: test
path: tur-ukr/test-*
- split: validation
path: tur-ukr/validation-*
- config_name: tur-uzb
data_files:
- split: test
path: tur-uzb/test-*
- config_name: tur-zho
data_files:
- split: test
path: tur-zho/test-*
- config_name: uig-zho
data_files:
- split: test
path: uig-zho/test-*
- config_name: uig_Arab-cmn_Hans
data_files:
- split: test
path: uig_Arab-cmn_Hans/test-*
- config_name: uig_Arab-cmn_Hant
data_files:
- split: test
path: uig_Arab-cmn_Hant/test-*
- config_name: ukr-cmn_Hans
data_files:
- split: test
path: ukr-cmn_Hans/test-*
- config_name: ukr-cmn_Hant
data_files:
- split: test
path: ukr-cmn_Hant/test-*
- config_name: ukr-ukr
data_files:
- split: test
path: ukr-ukr/test-*
- config_name: ukr-zho
data_files:
- split: test
path: ukr-zho/test-*
- config_name: vie-cmn_Hans
data_files:
- split: test
path: vie-cmn_Hans/test-*
- config_name: vie-vie
data_files:
- split: test
path: vie-vie/test-*
- config_name: vie-zho
data_files:
- split: test
path: vie-zho/test-*
- config_name: wuu-cmn_Hans
data_files:
- split: test
path: wuu-cmn_Hans/test-*
- split: validation
path: wuu-cmn_Hans/validation-*
- config_name: yid-yid
data_files:
- split: test
path: yid-yid/test-*
- config_name: zho-zho
data_files:
- split: test
path: zho-zho/test-*
- split: validation
path: zho-zho/validation-*
- config_name: zsm_Latn-ind
data_files:
- split: test
path: zsm_Latn-ind/test-*
---
# Dataset Card for DigitalLearningGmbH/tatoeba_mt_parquet
This is a mirror of [Helsinki-NLP/tatoeba_mt](https://huggingface.co/datasets/Helsinki-NLP/tatoeba_mt), converted to parquet for compatibility with newer huggingface requirements.
Original dataset card follows.
## Table of Contents
- [Table of Contents](#table-of-contents)
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** https://github.com/Helsinki-NLP/Tatoeba-Challenge/
- **Repository:** https://github.com/Helsinki-NLP/Tatoeba-Challenge/
- **Paper:** [The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT](https://aclanthology.org/2020.wmt-1.139/)
- **Leaderboard:**
- **Point of Contact:** [Jörg Tiedemann](mailto:jorg.tiedemann@helsinki.fi)
### Dataset Summary
The Tatoeba Translation Challenge is a multilingual data set of machine translation benchmarks derived from user-contributed translations collected by [Tatoeba.org](https://tatoeba.org/) and provided as parallel corpus from [OPUS](https://opus.nlpl.eu/). This dataset includes test and development data sorted by language pair. It includes test sets for hundreds of language pairs and is continuously updated. Please, check the version number tag to refer to the release that your are using.
### Supported Tasks and Leaderboards
The translation task is described in detail in the [Tatoeba-Challenge repository](https://github.com/Helsinki-NLP/Tatoeba-Challenge) and covers various sub-tasks with different data coverage and resources. [Training data](https://github.com/Helsinki-NLP/Tatoeba-Challenge/blob/master/data/README.md) is also available from the same repository and [results](https://github.com/Helsinki-NLP/Tatoeba-Challenge/blob/master/results/tatoeba-results-all.md) are published and collected as well. [Models](https://github.com/Helsinki-NLP/Tatoeba-Challenge/blob/master/results/tatoeba-models-all.md) are also released for public use and are also partially available from the [huggingface model hub](https://huggingface.co/Helsinki-NLP).
### Languages
The data set covers hundreds of languages and language pairs and are organized by ISO-639-3 languages. The current release covers the following language: Afrikaans, Arabic, Azerbaijani, Belarusian, Bulgarian, Bengali, Breton, Bosnian, Catalan, Chamorro, Czech, Chuvash, Welsh, Danish, German, Modern Greek, English, Esperanto, Spanish, Estonian, Basque, Persian, Finnish, Faroese, French, Western Frisian, Irish, Scottish Gaelic, Galician, Guarani, Hebrew, Hindi, Croatian, Hungarian, Armenian, Interlingua, Indonesian, Interlingue, Ido, Icelandic, Italian, Japanese, Javanese, Georgian, Kazakh, Khmer, Korean, Kurdish, Cornish, Latin, Luxembourgish, Lithuanian, Latvian, Maori, Macedonian, Malayalam, Mongolian, Marathi, Malay, Maltese, Burmese, Norwegian Bokmål, Dutch, Norwegian Nynorsk, Norwegian, Occitan, Polish, Portuguese, Quechua, Rundi, Romanian, Russian, Serbo-Croatian, Slovenian, Albanian, Serbian, Swedish, Swahili, Tamil, Telugu, Thai, Turkmen, Tagalog, Turkish, Tatar, Uighur, Ukrainian, Urdu, Uzbek, Vietnamese, Volapük, Yiddish, Chinese
## Dataset Structure
### Data Instances
Data instances are given as translation units in TAB-separated files with four columns: source and target language ISO-639-3 codes, source language text and target language text. Note that we do not imply a translation direction and consider the data set to be symmetric and to be used as a test set in both directions. Language-pair-specific subsets are only provided under the label of one direction using sorted ISO-639-3 language IDs.
Some subsets contain several sub-languages or language variants. They may refer to macro-languages such as Serbo-Croatian languages that are covered by the ISO code `hbs`. Language variants may also include different writing systems and in that case the ISO15924 script codes are attached to the language codes. Here are a few examples from the English to Serbo-Croation test set including examples for Bosnian, Croatian and Serbian in Cyrillic and in Latin scripts:
```
eng bos_Latn Children are the flowers of our lives. Djeca su cvijeće našeg života.
eng hrv A bird was flying high up in the sky. Ptica je visoko letjela nebom.
eng srp_Cyrl A bird in the hand is worth two in the bush. Боље врабац у руци, него голуб на грани.
eng srp_Latn Canada is the motherland of ice hockey. Kanada je zemlja-majka hokeja na ledu.
```
There are also data sets with sentence pairs in the same language. In most cases, those are variants with minor spelling differences but they also include rephrased sentences. Here are a few examples from the English test set:
```
eng eng All of us got into the car. We all got in the car.
eng eng All of us hope that doesn't happen. All of us hope that that doesn't happen.
eng eng All the seats are booked. The seats are all sold out.
```
### Data Splits
Test and development data sets are disjoint with respect to sentence pairs but may include overlaps in individual source or target language sentences. Development data should not be used in training directly. The goal of the data splits is to create test sets of reasonable size with a large language coverage. Test sets include at most 10,000 instances. Development data do not exist for all language pairs.
To be comparable with other results, models should use the training data distributed from the [Tatoeba MT Challenge Repository](https://github.com/Helsinki-NLP/Tatoeba-Challenge/) including monolingual data sets also listed there.
## Dataset Creation
### Curation Rationale
The Tatoeba MT data set will be updated continuously and the data preparation procedures are also public and released on [github](https://github.com/Helsinki-NLP/Tatoeba-Challenge/). High language coverage is the main goal of the project and data sets are prepared to be consistent and systematic with standardized language labels and distribution formats.
### Source Data
#### Initial Data Collection and Normalization
The Tatoeba data sets are collected from user-contributed translations submitted to [Tatoeba.org](https://tatoeba.org/) and compiled into a multi-parallel corpus in [OPUS](https://opus.nlpl.eu/Tatoeba.php). The test and development sets are incrementally updated with new releases of the Tatoeba data collection at OPUS. New releases extend the existing data sets. Test sets should not overlap with any of the released development data sets.
#### Who are the source language producers?
The data sets come from [Tatoeba.org](https://tatoeba.org/), which provides a large database of sentences and their translations into a wide varity of languages. Its content is constantly growing as a result of voluntary contributions of thousands of users.
The original project was founded by Trang Ho in 2006, hosted on Sourceforge under the codename of multilangdict.
### Annotations
#### Annotation process
Sentences are translated by volunteers and the Tatoeba database also provides additional metadata about each record including user ratings etc. However, the metadata is currently not used in any way for the compilation of the MT benchmark. Language skills of contributors naturally vary quite a bit and not all translations are done by native speakers of the target language. More information about the contributions can be found at [Tatoeba.org](https://tatoeba.org/).
#### Who are the annotators?
### Personal and Sensitive Information
For information about handling personal and sensitive information we refer to the [original provider](https://tatoeba.org/) of the data. This data set has not been processed in any way to detect or remove potentially sensitve or personal information.
## Considerations for Using the Data
### Social Impact of Dataset
The language coverage is high and with that it represents a highly valuable resource for machine translation development especially for lesser resourced languages and language pairs. The constantly growing database also represents a dynamic resource and its value wil grow further.
### Discussion of Biases
The original source lives from its contributors and there interest and background will to certain subjective and cultural biases. Language coverage and translation quality is also biased by the skills of the contributors.
### Other Known Limitations
The sentences are typically quite short and, therefore, rather easy to translate. For high-resource languages, this leads to results that will be less useful than more challenging benchmarks. For lesser resource language pairs, the limited complexity of the examples is actually a good thing to measure progress even in very challenging setups.
## Additional Information
### Dataset Curators
The data set is curated by the University of Helsinki and its [language technology research group](https://blogs.helsinki.fi/language-technology/). Data and tools used for creating and using the resource are [open source](https://github.com/Helsinki-NLP/Tatoeba-Challenge/) and will be maintained as part of the [OPUS ecosystem](https://opus.nlpl.eu/) for parallel data and machine translation research.
### Licensing Information
The data sets are distributed under the same licence agreement as the original Tatoeba database using a
[CC-BY 2.0 license](https://creativecommons.org/licenses/by/2.0/fr/). More information about the terms of use of the original data sets is listed [here](https://tatoeba.org/eng/terms_of_use).
### Citation Information
If you use the data sets then, please, cite the following paper: [The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT](https://aclanthology.org/2020.wmt-1.139/)
```
@inproceedings{tiedemann-2020-tatoeba,
title = "The Tatoeba Translation Challenge {--} Realistic Data Sets for Low Resource and Multilingual {MT}",
author = {Tiedemann, J{\"o}rg},
booktitle = "Proceedings of the Fifth Conference on Machine Translation",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.wmt-1.139",
pages = "1174--1182",
}
```
### Contributions
Thanks to [@jorgtied](https://github.com/jorgtied) and [@Helsinki-NLP](https://github.com/Helsinki-NLP) for adding this dataset.
Thanks also to [CSC Finland](https://www.csc.fi/en/solutions-for-research) for providing computational resources and storage space for the work on OPUS and other MT projects.
提供机构:
DigitalLearningGmbH



