five

DigitalLearningGmbH/tatoeba_mt_parquet

收藏
Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/DigitalLearningGmbH/tatoeba_mt_parquet
下载链接
链接失效反馈
官方服务:
资源简介:
--- language_creators: - crowdsourced language: - af - ar - az - be - bg - bn - br - bs - ca - ch - cs - cv - cy - da - de - el - en - eo - es - et - eu - fa - fi - fo - fr - fy - ga - gd - gl - gn - he - hi - hr - hu - hy - ia - id - ie - io - is - it - ja - jv - ka - kk - km - ko - ku - kw - la - lb - lt - lv - mi - mk - ml - mn - mr - ms - mt - my - nb - nl - nn - 'no' - oc - pl - pt - qu - rn - ro - ru - sh - sl - sq - sr - sv - sw - ta - te - th - tk - tl - tr - tt - ug - uk - ur - uz - vi - vo - yi - zh license: - cc-by-2.0 multilinguality: - translation pretty_name: The Tatoeba Translation Challenge source_datasets: - Helsinki-NLP/tatoeba_mt task_categories: - text-generation - translation dataset_info: - config_name: afr-deu features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 118869 num_examples: 1582 download_size: 65914 dataset_size: 118869 - config_name: afr-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 112767 num_examples: 1373 - name: validation num_bytes: 81872 num_examples: 1006 download_size: 105739 dataset_size: 194639 - config_name: afr-epo features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 94719 num_examples: 1115 download_size: 50877 dataset_size: 94719 - config_name: afr-nld features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 80656 num_examples: 1055 download_size: 44599 dataset_size: 80656 - config_name: afr-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 19782 num_examples: 227 download_size: 12448 dataset_size: 19782 - config_name: afr-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 34352 num_examples: 447 download_size: 20392 dataset_size: 34352 - config_name: ain-fin features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 16599 num_examples: 205 download_size: 8611 dataset_size: 16599 - config_name: ara-ber features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 237570 num_examples: 2316 - name: validation num_bytes: 107328 num_examples: 1031 download_size: 172171 dataset_size: 344898 - config_name: ara-ber_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 235939 num_examples: 2298 - name: validation num_bytes: 106592 num_examples: 1024 download_size: 170778 dataset_size: 342531 - config_name: ara-deu features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 117082 num_examples: 1208 - name: validation num_bytes: 99845 num_examples: 1028 download_size: 117851 dataset_size: 216927 - config_name: ara-ell features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 37867 num_examples: 425 download_size: 18275 dataset_size: 37867 - config_name: ara-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1061565 num_examples: 10304 - name: validation num_bytes: 2011932 num_examples: 19528 download_size: 1504481 dataset_size: 3073497 - config_name: ara-epo features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 68034 num_examples: 752 - name: validation num_bytes: 8881 num_examples: 93 download_size: 41462 dataset_size: 76915 - config_name: ara-fra features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 152919 num_examples: 1568 - name: validation num_bytes: 3876 num_examples: 35 download_size: 84050 dataset_size: 156795 - config_name: ara-heb features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 125166 num_examples: 1207 - name: validation num_bytes: 9942 num_examples: 90 download_size: 61848 dataset_size: 135108 - config_name: ara-ita features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 21525 num_examples: 234 download_size: 14866 dataset_size: 21525 - config_name: ara-jpn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 152326 num_examples: 1335 download_size: 69625 dataset_size: 152326 - config_name: ara-jpn_Hira features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 124101 num_examples: 1091 download_size: 57183 dataset_size: 124101 - config_name: ara-pol features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 18625 num_examples: 206 download_size: 12101 dataset_size: 18625 - config_name: ara-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 411321 num_examples: 3714 - name: validation num_bytes: 137107 num_examples: 1208 download_size: 258067 dataset_size: 548428 - config_name: ara-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 136907 num_examples: 1510 - name: validation num_bytes: 94209 num_examples: 1030 download_size: 123987 dataset_size: 231116 - config_name: ara-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 124338 num_examples: 1262 - name: validation num_bytes: 5192 num_examples: 51 download_size: 73967 dataset_size: 129530 - config_name: arq-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 40862 num_examples: 404 - name: validation num_bytes: 77063 num_examples: 735 download_size: 66370 dataset_size: 117925 - config_name: avk-fra features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 122566 num_examples: 1243 - name: validation num_bytes: 3632 num_examples: 42 download_size: 75212 dataset_size: 126198 - config_name: avk-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 21703 num_examples: 274 download_size: 14456 dataset_size: 21703 - config_name: awa-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 23967 num_examples: 278 download_size: 10739 dataset_size: 23967 - config_name: aze-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 217279 num_examples: 2658 - name: validation num_bytes: 83428 num_examples: 1011 download_size: 152993 dataset_size: 300707 - config_name: aze-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 16435 num_examples: 209 download_size: 10124 dataset_size: 16435 - config_name: aze-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 95278 num_examples: 1149 download_size: 51761 dataset_size: 95278 - config_name: aze_Latn-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 94499 num_examples: 1140 download_size: 51214 dataset_size: 94499 - config_name: bel-deu features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 63441 num_examples: 550 download_size: 36220 dataset_size: 63441 - config_name: bel-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 273466 num_examples: 2499 - name: validation num_bytes: 471243 num_examples: 4264 download_size: 378527 dataset_size: 744709 - config_name: bel-epo features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 84415 num_examples: 734 download_size: 45529 dataset_size: 84415 - config_name: bel-fra features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 30143 num_examples: 282 download_size: 18702 dataset_size: 30143 - config_name: bel-ita features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 25476 num_examples: 263 download_size: 14548 dataset_size: 25476 - config_name: bel-lat features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 21928 num_examples: 221 download_size: 12906 dataset_size: 21928 - config_name: bel-nld features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 69844 num_examples: 605 download_size: 39407 dataset_size: 69844 - config_name: bel-pol features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 29581 num_examples: 286 download_size: 18853 dataset_size: 29581 - config_name: bel-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 394748 num_examples: 2499 - name: validation num_bytes: 437140 num_examples: 2753 download_size: 424908 dataset_size: 831888 - config_name: bel-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 21243 num_examples: 204 download_size: 13942 dataset_size: 21243 - config_name: bel-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 312720 num_examples: 2354 - name: validation num_bytes: 129835 num_examples: 1020 download_size: 218883 dataset_size: 442555 - config_name: bel-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 33111 num_examples: 324 download_size: 18724 dataset_size: 33111 - config_name: ben-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 264408 num_examples: 2499 - name: validation num_bytes: 280510 num_examples: 2647 download_size: 202248 dataset_size: 544918 - config_name: ber-deu features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 63776 num_examples: 660 download_size: 32279 dataset_size: 63776 - config_name: ber-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1028063 num_examples: 10963 - name: validation num_bytes: 10194009 num_examples: 108421 download_size: 4199708 dataset_size: 11222072 - config_name: ber-epo features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 151437 num_examples: 1575 download_size: 67850 dataset_size: 151437 - config_name: ber-fra features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 879737 num_examples: 10143 - name: validation num_bytes: 3004766 num_examples: 34704 download_size: 1478032 dataset_size: 3884503 - config_name: ber-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 927625 num_examples: 10015 - name: validation num_bytes: 1299082 num_examples: 14043 download_size: 1003062 dataset_size: 2226707 - config_name: ber_Latn-deu features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 63499 num_examples: 658 download_size: 32045 dataset_size: 63499 - config_name: ber_Latn-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1027874 num_examples: 10961 - name: validation num_bytes: 10191650 num_examples: 108399 download_size: 4199153 dataset_size: 11219524 - config_name: ber_Latn-epo features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 151277 num_examples: 1574 download_size: 67604 dataset_size: 151277 - config_name: ber_Latn-fra features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 879352 num_examples: 10139 - name: validation num_bytes: 3004095 num_examples: 34697 download_size: 1477107 dataset_size: 3883447 - config_name: bre-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 25950 num_examples: 382 download_size: 14563 dataset_size: 25950 - config_name: bre-fra features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 175074 num_examples: 2493 - name: validation num_bytes: 219576 num_examples: 3059 download_size: 184284 dataset_size: 394650 - config_name: bua-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 84023 num_examples: 805 download_size: 37254 dataset_size: 84023 - config_name: bua_Cyrl-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 83989 num_examples: 804 download_size: 37241 dataset_size: 83989 - config_name: bul-bul features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 26317 num_examples: 218 download_size: 13765 dataset_size: 26317 - config_name: bul-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 34830 num_examples: 277 download_size: 20329 dataset_size: 34830 - config_name: bul-deu features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 33936 num_examples: 313 download_size: 20291 dataset_size: 33936 - config_name: bul-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1084142 num_examples: 9999 - name: validation num_bytes: 839170 num_examples: 7799 download_size: 846278 dataset_size: 1923312 - config_name: bul-epo features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 63586 num_examples: 569 download_size: 34152 dataset_size: 63586 - config_name: bul-fra features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 53950 num_examples: 445 download_size: 29941 dataset_size: 53950 - config_name: bul-ita features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 271156 num_examples: 2499 - name: validation num_bytes: 437184 num_examples: 4005 download_size: 325323 dataset_size: 708340 - config_name: bul-jpn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 43551 num_examples: 320 download_size: 24199 dataset_size: 43551 - config_name: bul-jpn_Hira features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 36791 num_examples: 263 download_size: 20835 dataset_size: 36791 - config_name: bul-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 157433 num_examples: 1246 - name: validation num_bytes: 123953 num_examples: 999 download_size: 132261 dataset_size: 281386 - config_name: bul-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 27878 num_examples: 285 download_size: 16599 dataset_size: 27878 - config_name: bul-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 98572 num_examples: 833 download_size: 49498 dataset_size: 98572 - config_name: bul-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 100400 num_examples: 1019 download_size: 44328 dataset_size: 100400 - config_name: bul-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 50020 num_examples: 415 download_size: 28061 dataset_size: 50020 - config_name: cat-deu features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 68540 num_examples: 722 download_size: 41879 dataset_size: 68540 - config_name: cat-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 146100 num_examples: 1630 - name: validation num_bytes: 26029 num_examples: 227 download_size: 101766 dataset_size: 172129 - config_name: cat-epo features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 70561 num_examples: 771 download_size: 41296 dataset_size: 70561 - config_name: cat-fra features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 67131 num_examples: 699 download_size: 41209 dataset_size: 67131 - config_name: cat-ita features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 25945 num_examples: 297 download_size: 17792 dataset_size: 25945 - config_name: cat-nld features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 48725 num_examples: 577 download_size: 30139 dataset_size: 48725 - config_name: cat-por features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 74541 num_examples: 746 download_size: 44473 dataset_size: 74541 - config_name: cat-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 149550 num_examples: 1533 - name: validation num_bytes: 130044 num_examples: 1293 download_size: 168736 dataset_size: 279594 - config_name: cat-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 40603 num_examples: 455 download_size: 22471 dataset_size: 40603 - config_name: cbk-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 128209 num_examples: 1497 - name: validation num_bytes: 85439 num_examples: 1000 download_size: 101501 dataset_size: 213648 - config_name: ceb-deu features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 86979 num_examples: 902 download_size: 49829 dataset_size: 86979 - config_name: ceb-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 27661 num_examples: 377 download_size: 17150 dataset_size: 27661 - config_name: ces-deu features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 324334 num_examples: 3489 - name: validation num_bytes: 107095 num_examples: 1126 download_size: 249963 dataset_size: 431429 - config_name: ces-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1208777 num_examples: 13823 - name: validation num_bytes: 1406567 num_examples: 16188 download_size: 1371882 dataset_size: 2615344 - config_name: ces-epo features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 304288 num_examples: 3470 - name: validation num_bytes: 90934 num_examples: 1057 download_size: 232623 dataset_size: 395222 - config_name: ces-fra features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 39317 num_examples: 437 download_size: 26267 dataset_size: 39317 - config_name: ces-hun features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 144509 num_examples: 1910 - name: validation num_bytes: 7995 num_examples: 113 download_size: 86499 dataset_size: 152504 - config_name: ces-ita features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 98215 num_examples: 1098 download_size: 62249 dataset_size: 98215 - config_name: ces-lat features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 26405 num_examples: 322 download_size: 16842 dataset_size: 26405 - config_name: ces-pol features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 47119 num_examples: 567 download_size: 32463 dataset_size: 47119 - config_name: ces-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 289216 num_examples: 2933 - name: validation num_bytes: 540064 num_examples: 5465 download_size: 412625 dataset_size: 829280 - config_name: ces-slv features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 34502 num_examples: 488 download_size: 22656 dataset_size: 34502 - config_name: ces-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 148657 num_examples: 1786 - name: validation num_bytes: 94319 num_examples: 1113 download_size: 122379 dataset_size: 242976 - config_name: cha-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 14322 num_examples: 226 download_size: 8820 dataset_size: 14322 - config_name: chm-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 414650 num_examples: 2749 - name: validation num_bytes: 145873 num_examples: 999 download_size: 280544 dataset_size: 560523 - config_name: chv-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 32634 num_examples: 335 download_size: 18901 dataset_size: 32634 - config_name: chv-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 45397 num_examples: 381 download_size: 24053 dataset_size: 45397 - config_name: chv-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 50662 num_examples: 507 download_size: 27156 dataset_size: 50662 - config_name: cmn_Hans-wuu features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 83509 num_examples: 814 - name: validation num_bytes: 184233 num_examples: 1833 download_size: 160147 dataset_size: 267742 - config_name: cor-deu features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 57500 num_examples: 820 download_size: 25536 dataset_size: 57500 - config_name: cor-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 202063 num_examples: 3197 - name: validation num_bytes: 63311 num_examples: 999 download_size: 104772 dataset_size: 265374 - config_name: cor-epo features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 44825 num_examples: 662 download_size: 20025 dataset_size: 44825 - config_name: cor-fra features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 38316 num_examples: 554 download_size: 18077 dataset_size: 38316 - config_name: cor-ita features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 17926 num_examples: 286 download_size: 9237 dataset_size: 17926 - config_name: cor-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 17034 num_examples: 217 download_size: 9056 dataset_size: 17034 - config_name: cor-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 13929 num_examples: 205 download_size: 8528 dataset_size: 13929 - config_name: crh-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 16909 num_examples: 207 download_size: 10899 dataset_size: 16909 - config_name: cym-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 67303 num_examples: 817 download_size: 36462 dataset_size: 67303 - config_name: dan-dan features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 20508 num_examples: 211 download_size: 13056 dataset_size: 20508 - config_name: dan-deu features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 939346 num_examples: 9997 - name: validation num_bytes: 653108 num_examples: 6920 download_size: 822653 dataset_size: 1592454 - config_name: dan-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 937157 num_examples: 10794 - name: validation num_bytes: 1731687 num_examples: 20088 download_size: 1309642 dataset_size: 2668844 - config_name: dan-fin features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 210769 num_examples: 2664 - name: validation num_bytes: 146073 num_examples: 1742 download_size: 139564 dataset_size: 356842 - config_name: dan-fra features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 141621 num_examples: 1730 - name: validation num_bytes: 5094 num_examples: 44 download_size: 78641 dataset_size: 146715 - config_name: dan-ita features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 27065 num_examples: 283 download_size: 17488 dataset_size: 27065 - config_name: dan-jpn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 85475 num_examples: 903 download_size: 40686 dataset_size: 85475 - config_name: dan-jpn_Hira features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 68510 num_examples: 715 download_size: 32860 dataset_size: 68510 - config_name: dan-nld features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 128850 num_examples: 1642 - name: validation num_bytes: 2811 num_examples: 35 download_size: 73101 dataset_size: 131661 - config_name: dan-nob features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 117135 num_examples: 1298 - name: validation num_bytes: 91147 num_examples: 1011 download_size: 120282 dataset_size: 208282 - config_name: dan-nor features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 117924 num_examples: 1310 - name: validation num_bytes: 91819 num_examples: 1016 download_size: 121240 dataset_size: 209743 - config_name: dan-por features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 65095 num_examples: 872 download_size: 34791 dataset_size: 65095 - config_name: dan-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 176873 num_examples: 1712 - name: validation num_bytes: 104636 num_examples: 1019 download_size: 143203 dataset_size: 281509 - config_name: dan-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 437841 num_examples: 4999 - name: validation num_bytes: 451220 num_examples: 5135 download_size: 462053 dataset_size: 889061 - config_name: dan-swe features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 125774 num_examples: 1548 - name: validation num_bytes: 115669 num_examples: 1439 download_size: 129907 dataset_size: 241443 - config_name: dan-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 53781 num_examples: 757 download_size: 30139 dataset_size: 53781 - config_name: deu-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 258851 num_examples: 2608 - name: validation num_bytes: 437881 num_examples: 4365 download_size: 369151 dataset_size: 696732 - config_name: deu-cmn_Hant features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 107987 num_examples: 1241 - name: validation num_bytes: 199067 num_examples: 2250 download_size: 159365 dataset_size: 307054 - config_name: deu-deu features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 267826 num_examples: 2499 - name: validation num_bytes: 223119 num_examples: 2114 download_size: 259164 dataset_size: 490945 - config_name: deu-dsb features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 55716 num_examples: 639 download_size: 32913 dataset_size: 55716 - config_name: ces-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 153794 num_examples: 1462 - name: validation num_bytes: 10429 num_examples: 133 download_size: 94349 dataset_size: 164223 - config_name: dan-epo features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 975819 num_examples: 11068 - name: validation num_bytes: 1706235 num_examples: 19221 download_size: 1303739 dataset_size: 2682054 - config_name: deu-ell features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 268771 num_examples: 2499 - name: validation num_bytes: 306772 num_examples: 2800 download_size: 278036 dataset_size: 575543 - config_name: deu-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1805627 num_examples: 17564 - name: validation num_bytes: 28610541 num_examples: 289748 download_size: 13423417 dataset_size: 30416168 - config_name: deu-est features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 19903 num_examples: 243 download_size: 13037 dataset_size: 19903 - config_name: deu-eus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 41039 num_examples: 455 download_size: 25696 dataset_size: 41039 - config_name: deu-fas features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 400446 num_examples: 3184 - name: validation num_bytes: 129569 num_examples: 1024 download_size: 276027 dataset_size: 530015 - config_name: deu-fin features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 239646 num_examples: 2646 - name: validation num_bytes: 642456 num_examples: 7141 download_size: 462553 dataset_size: 882102 - config_name: deu-fra features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1281879 num_examples: 12417 - name: validation num_bytes: 10120004 num_examples: 98157 download_size: 5431932 dataset_size: 11401883 - config_name: deu-frr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 22655 num_examples: 277 download_size: 15560 dataset_size: 22655 - config_name: deu-gos features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 14640 num_examples: 206 download_size: 10515 dataset_size: 14640 - config_name: deu-hbs features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 193417 num_examples: 1958 download_size: 109664 dataset_size: 193417 - config_name: deu-heb features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 342033 num_examples: 3089 - name: validation num_bytes: 125730 num_examples: 1124 download_size: 225334 dataset_size: 467763 - config_name: deu-hrv features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 67567 num_examples: 781 download_size: 40162 dataset_size: 67567 - config_name: deu-hrx features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 37662 num_examples: 470 download_size: 20237 dataset_size: 37662 - config_name: deu-hsb features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 58625 num_examples: 665 download_size: 34827 dataset_size: 58625 - config_name: deu-hun features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1534998 num_examples: 15341 - name: validation num_bytes: 5346859 num_examples: 54082 download_size: 3523593 dataset_size: 6881857 - config_name: deu-ido features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 69192 num_examples: 870 - name: validation num_bytes: 7031 num_examples: 95 download_size: 40976 dataset_size: 76223 - config_name: deu-ile features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 160935 num_examples: 2002 - name: validation num_bytes: 107825 num_examples: 1371 download_size: 132108 dataset_size: 268760 - config_name: deu-ina features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 152094 num_examples: 1256 - name: validation num_bytes: 9464 num_examples: 114 download_size: 90647 dataset_size: 161558 - config_name: deu-ind features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 47760 num_examples: 496 download_size: 28126 dataset_size: 47760 - config_name: deu-isl features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 79958 num_examples: 968 download_size: 42419 dataset_size: 79958 - config_name: deu-ita features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 992180 num_examples: 10093 - name: validation num_bytes: 1196094 num_examples: 12197 download_size: 1126832 dataset_size: 2188274 - config_name: deu-jbo features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 131489 num_examples: 1448 download_size: 65705 dataset_size: 131489 - config_name: deu-jpn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1370543 num_examples: 11427 - name: validation num_bytes: 3591001 num_examples: 29867 download_size: 2331243 dataset_size: 4961544 - config_name: deu-jpn_Hani features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 144613 num_examples: 1289 - name: validation num_bytes: 399245 num_examples: 3631 download_size: 279320 dataset_size: 543858 - config_name: deu-jpn_Hira features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1187608 num_examples: 9809 - name: validation num_bytes: 3092140 num_examples: 25378 download_size: 1993699 dataset_size: 4279748 - config_name: deu-jpn_Kana features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 37111 num_examples: 320 - name: validation num_bytes: 96454 num_examples: 829 download_size: 68455 dataset_size: 133565 - config_name: deu-kab features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 30016 num_examples: 372 download_size: 17845 dataset_size: 30016 - config_name: deu-kor features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 123242 num_examples: 1103 download_size: 68033 dataset_size: 123242 - config_name: deu-kor_Hang features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 120181 num_examples: 1069 download_size: 66892 dataset_size: 120181 - config_name: deu-kur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 18833 num_examples: 236 download_size: 11349 dataset_size: 18833 - config_name: deu-kur_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 17483 num_examples: 222 download_size: 10468 dataset_size: 17483 - config_name: deu-lad features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 17057 num_examples: 219 - name: validation num_bytes: 2253 num_examples: 27 download_size: 12097 dataset_size: 19310 - config_name: deu-lat features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 166757 num_examples: 2015 - name: validation num_bytes: 94276 num_examples: 1101 download_size: 135876 dataset_size: 261033 - config_name: deu-lfn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 48086 num_examples: 417 - name: validation num_bytes: 5806 num_examples: 53 download_size: 30628 dataset_size: 53892 - config_name: deu-lfn_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 29221 num_examples: 290 - name: validation num_bytes: 2821 num_examples: 31 download_size: 22388 dataset_size: 32042 - config_name: deu-lit features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 105831 num_examples: 1114 - name: validation num_bytes: 29448 num_examples: 354 download_size: 78897 dataset_size: 135279 - config_name: deu-ltz features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 27871 num_examples: 346 download_size: 14535 dataset_size: 27871 - config_name: deu-msa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 53011 num_examples: 543 download_size: 31039 dataset_size: 53011 - config_name: deu-nds features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 916195 num_examples: 9998 - name: validation num_bytes: 724703 num_examples: 7842 download_size: 846482 dataset_size: 1640898 - config_name: deu-nld features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 934152 num_examples: 10217 - name: validation num_bytes: 2407739 num_examples: 26386 download_size: 1674495 dataset_size: 3341891 - config_name: deu-nob features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 405671 num_examples: 3524 - name: validation num_bytes: 109248 num_examples: 965 download_size: 283830 dataset_size: 514919 - config_name: deu-nor features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 417281 num_examples: 3650 - name: validation num_bytes: 112350 num_examples: 1000 download_size: 292704 dataset_size: 529631 - config_name: deu-pol features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 465291 num_examples: 4999 - name: validation num_bytes: 529225 num_examples: 5700 download_size: 553141 dataset_size: 994516 - config_name: deu-por features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1013946 num_examples: 9999 - name: validation num_bytes: 736203 num_examples: 7044 download_size: 924126 dataset_size: 1750149 - config_name: deu-ron features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 98189 num_examples: 1140 - name: validation num_bytes: 86238 num_examples: 1012 download_size: 105500 dataset_size: 184427 - config_name: deu-run features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 121394 num_examples: 1751 - name: validation num_bytes: 88582 num_examples: 1190 download_size: 102436 dataset_size: 209976 - config_name: deu-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1499216 num_examples: 12799 - name: validation num_bytes: 11573090 num_examples: 100273 download_size: 5403337 dataset_size: 13072306 - config_name: deu-slv features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 35983 num_examples: 491 download_size: 21747 dataset_size: 35983 - config_name: deu-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1056787 num_examples: 10520 - name: validation num_bytes: 7477370 num_examples: 74985 download_size: 4207104 dataset_size: 8534157 - config_name: deu-srp_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 105768 num_examples: 985 download_size: 61261 dataset_size: 105768 - config_name: deu-swe features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 295272 num_examples: 3409 - name: validation num_bytes: 97264 num_examples: 1125 download_size: 217263 dataset_size: 392536 - config_name: deu-swg features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 184947 num_examples: 1522 - name: validation num_bytes: 17972 num_examples: 161 download_size: 127693 dataset_size: 202919 - config_name: deu-tat features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 401633 num_examples: 2500 - name: validation num_bytes: 552680 num_examples: 3464 download_size: 517051 dataset_size: 954313 - config_name: deu-tgl features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 32289 num_examples: 325 download_size: 19727 dataset_size: 32289 - config_name: deu-tlh features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 91283 num_examples: 1098 - name: validation num_bytes: 84607 num_examples: 1034 download_size: 93834 dataset_size: 175890 - config_name: deu-toki features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1066355 num_examples: 10113 - name: validation num_bytes: 1443886 num_examples: 13816 download_size: 1089456 dataset_size: 2510241 - config_name: deu-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 469878 num_examples: 4999 - name: validation num_bytes: 681055 num_examples: 7276 download_size: 610454 dataset_size: 1150933 - config_name: deu-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 988007 num_examples: 10318 - name: validation num_bytes: 1112017 num_examples: 11720 download_size: 922708 dataset_size: 2100024 - config_name: deu-vie features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 42346 num_examples: 313 download_size: 26799 dataset_size: 42346 - config_name: deu-vol features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 16426 num_examples: 205 download_size: 10419 dataset_size: 16426 - config_name: deu-yid features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 89530 num_examples: 852 - name: validation num_bytes: 14302 num_examples: 120 download_size: 48347 dataset_size: 103832 - config_name: deu-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 374528 num_examples: 3943 - name: validation num_bytes: 656083 num_examples: 6837 download_size: 536791 dataset_size: 1030611 - config_name: dsb-hsb features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 19291 num_examples: 330 download_size: 10518 dataset_size: 19291 - config_name: dsb-slv features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 20142 num_examples: 329 download_size: 10459 dataset_size: 20142 - config_name: dtp-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 164362 num_examples: 1926 - name: validation num_bytes: 89064 num_examples: 1011 download_size: 143844 dataset_size: 253426 - config_name: dtp-jpn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 24238 num_examples: 250 download_size: 15028 dataset_size: 24238 - config_name: dtp-jpn_Hira features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 22811 num_examples: 235 download_size: 14286 dataset_size: 22811 - config_name: dtp-msa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 59791 num_examples: 516 download_size: 34102 dataset_size: 59791 - config_name: dtp-zsm_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 58996 num_examples: 508 download_size: 33799 dataset_size: 58996 - config_name: egl-ita features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 13042 num_examples: 201 download_size: 7794 dataset_size: 13042 - config_name: ell-ell features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 64052 num_examples: 532 download_size: 27002 dataset_size: 64052 - config_name: ell-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1093958 num_examples: 10898 - name: validation num_bytes: 1252537 num_examples: 12919 download_size: 921557 dataset_size: 2346495 - config_name: ell-epo features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 65871 num_examples: 603 download_size: 35659 dataset_size: 65871 - config_name: ell-fra features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 153819 num_examples: 1505 download_size: 73106 dataset_size: 153819 - config_name: ell-ita features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 41030 num_examples: 423 download_size: 20769 dataset_size: 41030 - config_name: ell-nld features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 132368 num_examples: 921 - name: validation num_bytes: 9869 num_examples: 66 download_size: 80141 dataset_size: 142237 - config_name: ell-por features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 82594 num_examples: 884 - name: validation num_bytes: 2043 num_examples: 22 download_size: 41322 dataset_size: 84637 - config_name: ell-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 312405 num_examples: 2499 - name: validation num_bytes: 315095 num_examples: 2531 download_size: 287204 dataset_size: 627500 - config_name: ell-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 177528 num_examples: 1828 - name: validation num_bytes: 97334 num_examples: 1004 download_size: 136484 dataset_size: 274862 - config_name: ell-swe features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 24701 num_examples: 252 download_size: 13735 dataset_size: 24701 - config_name: ell-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 150047 num_examples: 1468 - name: validation num_bytes: 10316 num_examples: 108 download_size: 79339 dataset_size: 160363 - config_name: eng-bos_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 22596 num_examples: 300 - name: validation num_bytes: 14959 num_examples: 199 download_size: 24366 dataset_size: 37555 - config_name: eng-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 438854 num_examples: 4449 - name: validation num_bytes: 1777020 num_examples: 17966 download_size: 1162711 dataset_size: 2215874 - config_name: eng-cmn_Hant features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 414294 num_examples: 4475 - name: validation num_bytes: 1812557 num_examples: 19463 download_size: 1128183 dataset_size: 2226851 - config_name: eng-eng features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1269164 num_examples: 12061 - name: validation num_bytes: 10195524 num_examples: 96607 download_size: 3516603 dataset_size: 11464688 - config_name: eng-est features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 104016 num_examples: 1358 - name: validation num_bytes: 85232 num_examples: 1095 download_size: 110161 dataset_size: 189248 - config_name: eng-eus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 96502 num_examples: 1059 - name: validation num_bytes: 91908 num_examples: 1000 download_size: 107230 dataset_size: 188410 - config_name: eng-fao features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 23587 num_examples: 293 download_size: 15976 dataset_size: 23587 - config_name: eng-fas features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 455271 num_examples: 3761 - name: validation num_bytes: 124374 num_examples: 1030 download_size: 300492 dataset_size: 579645 - config_name: eng-fin features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 971336 num_examples: 10689 - name: validation num_bytes: 6265583 num_examples: 69895 download_size: 3296109 dataset_size: 7236919 - config_name: eng-fra features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1257949 num_examples: 12680 - name: validation num_bytes: 24063290 num_examples: 251748 download_size: 10576625 dataset_size: 25321239 - config_name: eng-fry features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 18261 num_examples: 219 download_size: 12827 dataset_size: 18261 - config_name: eng-gla features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 80003 num_examples: 954 download_size: 39209 dataset_size: 80003 - config_name: eng-gle features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 141399 num_examples: 1912 - name: validation num_bytes: 2295 num_examples: 26 download_size: 73403 dataset_size: 143694 - config_name: eng-glg features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 97097 num_examples: 1014 download_size: 55991 dataset_size: 97097 - config_name: eng-gos features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 70177 num_examples: 1153 - name: validation num_bytes: 7364 num_examples: 95 download_size: 41819 dataset_size: 77541 - config_name: eng-got features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 22693 num_examples: 201 download_size: 11634 dataset_size: 22693 - config_name: eng-grc features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 77943 num_examples: 613 download_size: 41733 dataset_size: 77943 - config_name: eng-gsw features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 12398 num_examples: 204 download_size: 8014 dataset_size: 12398 - config_name: eng-hbs features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 877077 num_examples: 10016 - name: validation num_bytes: 1437995 num_examples: 14205 download_size: 970160 dataset_size: 2315072 - config_name: eng-heb features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1029187 num_examples: 10518 - name: validation num_bytes: 15016495 num_examples: 153502 download_size: 6534814 dataset_size: 16045682 - config_name: eng-hin features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 632165 num_examples: 4999 - name: validation num_bytes: 750572 num_examples: 5943 download_size: 549131 dataset_size: 1382737 - config_name: eng-hoc features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 44534 num_examples: 659 download_size: 16568 dataset_size: 44534 - config_name: eng-hoc_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 44505 num_examples: 658 download_size: 16594 dataset_size: 44505 - config_name: eng-hrv features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 122085 num_examples: 1479 - name: validation num_bytes: 79066 num_examples: 948 download_size: 119777 dataset_size: 201151 - config_name: eng-hrx features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 16515 num_examples: 220 download_size: 10338 dataset_size: 16515 - config_name: eng-hun features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1115409 num_examples: 13036 - name: validation num_bytes: 8114616 num_examples: 97143 download_size: 4410369 dataset_size: 9230025 - config_name: eng-hye features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 86763 num_examples: 1120 - name: validation num_bytes: 78516 num_examples: 999 download_size: 75105 dataset_size: 165279 - config_name: eng-ido features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 172535 num_examples: 1959 - name: validation num_bytes: 126935 num_examples: 1482 download_size: 157198 dataset_size: 299470 - config_name: eng-ido_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 172449 num_examples: 1958 download_size: 90120 dataset_size: 172449 - config_name: eng-ile features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 132999 num_examples: 1710 - name: validation num_bytes: 86077 num_examples: 1107 download_size: 112481 dataset_size: 219076 - config_name: eng-ilo features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 91354 num_examples: 1092 - name: validation num_bytes: 83565 num_examples: 999 download_size: 97325 dataset_size: 174919 - config_name: eng-ina features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 533554 num_examples: 4987 - name: validation num_bytes: 649383 num_examples: 6233 download_size: 603107 dataset_size: 1182937 - config_name: eng-ind features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 380922 num_examples: 4288 - name: validation num_bytes: 520374 num_examples: 5808 download_size: 461005 dataset_size: 901296 - config_name: eng-isl features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 240876 num_examples: 2502 - name: validation num_bytes: 662112 num_examples: 6937 download_size: 467385 dataset_size: 902988 - config_name: eng-ita features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1441207 num_examples: 17319 - name: validation num_bytes: 38409070 num_examples: 472973 download_size: 11676126 dataset_size: 39850277 - config_name: eng-jav features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 22298 num_examples: 261 download_size: 14199 dataset_size: 22298 - config_name: eng-jbo features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 423796 num_examples: 4995 - name: validation num_bytes: 593343 num_examples: 6939 download_size: 498266 dataset_size: 1017139 - config_name: eng-jbo_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 423519 num_examples: 4991 - name: validation num_bytes: 593200 num_examples: 6936 download_size: 497925 dataset_size: 1016719 - config_name: eng-jpn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1657514 num_examples: 13861 - name: validation num_bytes: 23343978 num_examples: 194063 download_size: 10922866 dataset_size: 25001492 - config_name: eng-jpn_Hani features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 165303 num_examples: 1499 - name: validation num_bytes: 2221340 num_examples: 19904 download_size: 1156474 dataset_size: 2386643 - config_name: eng-jpn_Hira features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1431624 num_examples: 11830 - name: validation num_bytes: 20729481 num_examples: 170845 download_size: 9553590 dataset_size: 22161105 - config_name: eng-jpn_Kana features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 57468 num_examples: 502 - name: validation num_bytes: 383407 num_examples: 3228 download_size: 214818 dataset_size: 440875 - config_name: eng-kab features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 958325 num_examples: 12141 - name: validation num_bytes: 1099879 num_examples: 14438 download_size: 898677 dataset_size: 2058204 - config_name: eng-kat features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 106255 num_examples: 983 download_size: 42929 dataset_size: 106255 - config_name: eng-kaz features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 42142 num_examples: 402 download_size: 22786 dataset_size: 42142 - config_name: eng-kaz_Cyrl features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 41491 num_examples: 395 download_size: 22306 dataset_size: 41491 - config_name: eng-kha features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 99741 num_examples: 1313 - name: validation num_bytes: 1779 num_examples: 19 download_size: 50322 dataset_size: 101520 - config_name: eng-khm features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 79812 num_examples: 725 download_size: 33939 dataset_size: 79812 - config_name: eng-kor features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 236727 num_examples: 2399 - name: validation num_bytes: 104433 num_examples: 1040 download_size: 179764 dataset_size: 341160 - config_name: eng-kor_Hang features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 234873 num_examples: 2375 - name: validation num_bytes: 103713 num_examples: 1031 download_size: 178345 dataset_size: 338586 - config_name: eng-kur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 36063 num_examples: 441 - name: validation num_bytes: 6689 num_examples: 79 download_size: 26817 dataset_size: 42752 - config_name: eng-kur_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 22071 num_examples: 289 download_size: 13276 dataset_size: 22071 - config_name: eng-kzj features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 121039 num_examples: 1168 download_size: 67990 dataset_size: 121039 - config_name: eng-lad features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 57498 num_examples: 767 - name: validation num_bytes: 4378 num_examples: 57 download_size: 26724 dataset_size: 61876 - config_name: eng-lad_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 48604 num_examples: 671 - name: validation num_bytes: 2554 num_examples: 36 download_size: 23543 dataset_size: 51158 - config_name: eng-lat features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1089275 num_examples: 10193 - name: validation num_bytes: 1492502 num_examples: 14001 download_size: 1336363 dataset_size: 2581777 - config_name: eng-lav features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 133637 num_examples: 1630 download_size: 75998 dataset_size: 133637 - config_name: eng-lfn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 302601 num_examples: 3296 - name: validation num_bytes: 378841 num_examples: 3705 download_size: 326423 dataset_size: 681442 - config_name: eng-lfn_Cyrl features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 89601 num_examples: 846 - name: validation num_bytes: 150969 num_examples: 1219 download_size: 111777 dataset_size: 240570 - config_name: eng-lfn_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 212876 num_examples: 2449 - name: validation num_bytes: 227779 num_examples: 2485 download_size: 219777 dataset_size: 440655 - config_name: eng-lit features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 210416 num_examples: 2527 - name: validation num_bytes: 468022 num_examples: 5642 download_size: 358493 dataset_size: 678438 - config_name: eng-ltz features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 22109 num_examples: 292 download_size: 12793 dataset_size: 22109 - config_name: eng-mal features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 122845 num_examples: 801 download_size: 52038 dataset_size: 122845 - config_name: eng-mar features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1263570 num_examples: 10395 - name: validation num_bytes: 5203641 num_examples: 43057 download_size: 2177593 dataset_size: 6467211 - config_name: eng-mkd features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 983537 num_examples: 10009 - name: validation num_bytes: 6916962 num_examples: 70318 download_size: 3252466 dataset_size: 7900499 - config_name: eng-mlt features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 14036 num_examples: 202 download_size: 9489 dataset_size: 14036 - config_name: eng-mon features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 51132 num_examples: 414 - name: validation num_bytes: 2954 num_examples: 22 download_size: 32987 dataset_size: 54086 - config_name: eng-mri features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 35222 num_examples: 365 download_size: 21751 dataset_size: 35222 - config_name: eng-msa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 452195 num_examples: 4999 - name: validation num_bytes: 633153 num_examples: 6892 download_size: 557473 dataset_size: 1085348 - config_name: eng-mya features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 36681 num_examples: 215 download_size: 15537 dataset_size: 36681 - config_name: eng-nds features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 208894 num_examples: 2499 - name: validation num_bytes: 270713 num_examples: 3220 download_size: 259707 dataset_size: 479607 - config_name: eng-nld features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1088704 num_examples: 12695 - name: validation num_bytes: 5293182 num_examples: 62386 download_size: 2971500 dataset_size: 6381886 - config_name: eng-nno features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 40301 num_examples: 459 - name: validation num_bytes: 44105 num_examples: 504 download_size: 52863 dataset_size: 84406 - config_name: eng-nob features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 424638 num_examples: 4538 - name: validation num_bytes: 482450 num_examples: 5201 download_size: 494770 dataset_size: 907088 - config_name: eng-nor features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 465177 num_examples: 4999 - name: validation num_bytes: 526696 num_examples: 5706 download_size: 541002 dataset_size: 991873 - config_name: eng-nov features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 17413 num_examples: 216 download_size: 11444 dataset_size: 17413 - config_name: eng-nst features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 70980 num_examples: 804 download_size: 29710 dataset_size: 70980 - config_name: eng-oci features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 64323 num_examples: 840 download_size: 35159 dataset_size: 64323 - config_name: eng-orv features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 30357 num_examples: 321 download_size: 15683 dataset_size: 30357 - config_name: eng-ota features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 66860 num_examples: 687 download_size: 27433 dataset_size: 66860 - config_name: eng-ota_Arab features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 38076 num_examples: 370 download_size: 19282 dataset_size: 38076 - config_name: eng-ota_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 28693 num_examples: 316 download_size: 16607 dataset_size: 28693 - config_name: eng-pam features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 86922 num_examples: 999 - name: validation num_bytes: 43649 num_examples: 493 download_size: 67979 dataset_size: 130571 - config_name: eng-pes features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 454026 num_examples: 3756 download_size: 232007 dataset_size: 454026 - config_name: eng-pms features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 24232 num_examples: 268 download_size: 16311 dataset_size: 24232 - config_name: eng-pol features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 901788 num_examples: 10098 - name: validation num_bytes: 3941704 num_examples: 44188 download_size: 2447148 dataset_size: 4843492 - config_name: eng-por features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1243469 num_examples: 13221 - name: validation num_bytes: 18217294 num_examples: 204461 download_size: 8127961 dataset_size: 19460763 - config_name: eng-prg features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 21520 num_examples: 212 download_size: 14491 dataset_size: 21520 - config_name: eng-que features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 18477 num_examples: 256 download_size: 11724 dataset_size: 18477 - config_name: eng-rom features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 56689 num_examples: 705 download_size: 23862 dataset_size: 56689 - config_name: eng-ron features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 495218 num_examples: 5507 - name: validation num_bytes: 893923 num_examples: 9660 download_size: 752574 dataset_size: 1389141 - config_name: eng-run features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 110686 num_examples: 1702 download_size: 47908 dataset_size: 110686 - config_name: eng-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 2180204 num_examples: 19424 - name: validation num_bytes: 54715027 num_examples: 504173 download_size: 19305124 dataset_size: 56895231 - config_name: eng-slv features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 196477 num_examples: 2494 - name: validation num_bytes: 129115 num_examples: 1609 download_size: 187799 dataset_size: 325592 - config_name: eng-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1619357 num_examples: 16582 - name: validation num_bytes: 18576364 num_examples: 197298 download_size: 9383030 dataset_size: 20195721 - config_name: eng-sqi features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 98881 num_examples: 1108 download_size: 57489 dataset_size: 98881 - config_name: eng-srp_Cyrl features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 155339 num_examples: 1579 - name: validation num_bytes: 981407 num_examples: 8816 download_size: 399356 dataset_size: 1136746 - config_name: eng-srp_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 576558 num_examples: 6655 - name: validation num_bytes: 362157 num_examples: 4239 download_size: 438125 dataset_size: 938715 - config_name: eng-swa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 27342 num_examples: 386 download_size: 15781 dataset_size: 27342 - config_name: eng-swe features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 821251 num_examples: 10361 - name: validation num_bytes: 1230707 num_examples: 15557 download_size: 994551 dataset_size: 2051958 - config_name: eng-tam features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 45497 num_examples: 310 download_size: 20885 dataset_size: 45497 - config_name: eng-tat features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 161072 num_examples: 1450 download_size: 80076 dataset_size: 161072 - config_name: eng-tel features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 31925 num_examples: 260 download_size: 16355 dataset_size: 31925 - config_name: eng-tgl features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 231824 num_examples: 2499 - name: validation num_bytes: 445138 num_examples: 4795 download_size: 345932 dataset_size: 676962 - config_name: eng-tha features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 134201 num_examples: 1153 download_size: 60955 dataset_size: 134201 - config_name: eng-tlh features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 381052 num_examples: 4999 - name: validation num_bytes: 641737 num_examples: 8419 download_size: 491865 dataset_size: 1022789 - config_name: eng-toki features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 461097 num_examples: 4989 - name: validation num_bytes: 813506 num_examples: 8701 download_size: 529925 dataset_size: 1274603 - config_name: eng-tuk features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 243317 num_examples: 2499 - name: validation num_bytes: 379489 num_examples: 3865 download_size: 307706 dataset_size: 622806 - config_name: eng-tuk_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 243289 num_examples: 2498 download_size: 123509 dataset_size: 243289 - config_name: eng-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1298131 num_examples: 13741 - name: validation num_bytes: 60806843 num_examples: 658174 download_size: 25564718 dataset_size: 62104974 - config_name: eng-tzl features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 12738 num_examples: 200 - name: validation num_bytes: 1495 num_examples: 19 download_size: 10240 dataset_size: 14233 - config_name: eng-tzl_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 12706 num_examples: 199 download_size: 7751 dataset_size: 12706 - config_name: eng-uig features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 350875 num_examples: 3023 - name: validation num_bytes: 117229 num_examples: 1005 download_size: 207212 dataset_size: 468104 - config_name: eng-uig_Arab features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 350480 num_examples: 3020 - name: validation num_bytes: 117153 num_examples: 1004 download_size: 206993 dataset_size: 467633 - config_name: eng-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1323157 num_examples: 13126 - name: validation num_bytes: 15731337 num_examples: 159486 download_size: 5968292 dataset_size: 17054494 - config_name: eng-urd features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 180640 num_examples: 1662 download_size: 81930 dataset_size: 180640 - config_name: eng-uzb features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 36607 num_examples: 457 download_size: 18767 dataset_size: 36607 - config_name: eng-uzb_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 21475 num_examples: 300 download_size: 12341 dataset_size: 21475 - config_name: eng-vie features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 264072 num_examples: 2499 - name: validation num_bytes: 330959 num_examples: 2742 download_size: 307514 dataset_size: 595031 - config_name: eng-vol features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 118924 num_examples: 1540 - name: validation num_bytes: 95967 num_examples: 1257 download_size: 105856 dataset_size: 214891 - config_name: eng-war features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 135969 num_examples: 1511 download_size: 71531 dataset_size: 135969 - config_name: eng-xal features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 28327 num_examples: 280 download_size: 17604 dataset_size: 28327 - config_name: eng-yid features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 261728 num_examples: 2482 - name: validation num_bytes: 202614 num_examples: 1891 download_size: 194772 dataset_size: 464342 - config_name: eng-yue_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 71707 num_examples: 676 - name: validation num_bytes: 291333 num_examples: 2719 download_size: 201895 dataset_size: 363040 - config_name: eng-yue_Hant features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 40508 num_examples: 453 - name: validation num_bytes: 131436 num_examples: 1521 download_size: 93730 dataset_size: 171944 - config_name: eng-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 996210 num_examples: 10389 - name: validation num_bytes: 4138758 num_examples: 43074 download_size: 2665689 dataset_size: 5134968 - config_name: eng-zsm_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 55659 num_examples: 535 - name: validation num_bytes: 91235 num_examples: 844 download_size: 83025 dataset_size: 146894 - config_name: eng-zza features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 35981 num_examples: 528 download_size: 21157 dataset_size: 35981 - config_name: epo-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 128088 num_examples: 1403 - name: validation num_bytes: 55397 num_examples: 607 download_size: 103066 dataset_size: 183485 - config_name: epo-cmn_Hant features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 70643 num_examples: 830 - name: validation num_bytes: 34256 num_examples: 402 download_size: 59759 dataset_size: 104899 - config_name: epo-epo features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 917005 num_examples: 9999 - name: validation num_bytes: 896691 num_examples: 9675 download_size: 927651 dataset_size: 1813696 - config_name: epo-fas features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 298965 num_examples: 2513 - name: validation num_bytes: 874743 num_examples: 7429 download_size: 559082 dataset_size: 1173708 - config_name: epo-fin features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 240896 num_examples: 2861 - name: validation num_bytes: 87997 num_examples: 1046 download_size: 166693 dataset_size: 328893 - config_name: epo-fra features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1230182 num_examples: 12167 - name: validation num_bytes: 28854303 num_examples: 287076 download_size: 13338711 dataset_size: 30084485 - config_name: epo-glg features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 28349 num_examples: 328 download_size: 19455 dataset_size: 28349 - config_name: epo-hbs features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 288245 num_examples: 2499 - name: validation num_bytes: 321762 num_examples: 2771 download_size: 319944 dataset_size: 610007 - config_name: epo-heb features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1112244 num_examples: 10367 - name: validation num_bytes: 2115699 num_examples: 19606 download_size: 1476715 dataset_size: 3227943 - config_name: epo-hrv features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 38256 num_examples: 422 - name: validation num_bytes: 41597 num_examples: 474 download_size: 53172 dataset_size: 79853 - config_name: epo-hun features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 964243 num_examples: 10093 - name: validation num_bytes: 2750708 num_examples: 28754 download_size: 1977781 dataset_size: 3714951 - config_name: epo-ido features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 101408 num_examples: 1182 - name: validation num_bytes: 31558 num_examples: 416 download_size: 69083 dataset_size: 132966 - config_name: epo-ile features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 23197 num_examples: 315 - name: validation num_bytes: 1556 num_examples: 22 download_size: 16768 dataset_size: 24753 - config_name: epo-ile_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 23116 num_examples: 314 download_size: 13913 dataset_size: 23116 - config_name: epo-ina features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 282400 num_examples: 2643 - name: validation num_bytes: 143460 num_examples: 1447 download_size: 222582 dataset_size: 425860 - config_name: epo-isl features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 19877 num_examples: 232 download_size: 12758 dataset_size: 19877 - config_name: epo-ita features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 888636 num_examples: 10302 - name: validation num_bytes: 3453735 num_examples: 40618 download_size: 1983283 dataset_size: 4342371 - config_name: epo-jbo features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 97045 num_examples: 1166 download_size: 48311 dataset_size: 97045 - config_name: epo-jpn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 376590 num_examples: 3476 - name: validation num_bytes: 690918 num_examples: 6411 download_size: 505968 dataset_size: 1067508 - config_name: epo-jpn_Hani features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 35103 num_examples: 327 - name: validation num_bytes: 65995 num_examples: 657 download_size: 56387 dataset_size: 101098 - config_name: epo-jpn_Hira features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 325335 num_examples: 2985 - name: validation num_bytes: 605714 num_examples: 5574 download_size: 437358 dataset_size: 931049 - config_name: epo-lad features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 29891 num_examples: 365 - name: validation num_bytes: 4076 num_examples: 51 download_size: 17361 dataset_size: 33967 - config_name: epo-lad_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 20217 num_examples: 263 - name: validation num_bytes: 2146 num_examples: 30 download_size: 13699 dataset_size: 22363 - config_name: epo-lat features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 428425 num_examples: 2799 - name: validation num_bytes: 954895 num_examples: 5952 download_size: 797601 dataset_size: 1383320 - config_name: epo-lfn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 85462 num_examples: 996 - name: validation num_bytes: 25319 num_examples: 262 download_size: 51940 dataset_size: 110781 - config_name: epo-lfn_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 70595 num_examples: 868 - name: validation num_bytes: 17383 num_examples: 197 download_size: 44060 dataset_size: 87978 - config_name: epo-lit features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1017798 num_examples: 11510 - name: validation num_bytes: 897443 num_examples: 10428 download_size: 957411 dataset_size: 1915241 - config_name: epo-nds features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 214103 num_examples: 2535 - name: validation num_bytes: 100624 num_examples: 1180 download_size: 170026 dataset_size: 314727 - config_name: epo-nld features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1178268 num_examples: 12807 - name: validation num_bytes: 7452669 num_examples: 80836 download_size: 4054685 dataset_size: 8630937 - config_name: epo-nob features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 345244 num_examples: 3091 - name: validation num_bytes: 110134 num_examples: 986 download_size: 253811 dataset_size: 455378 - config_name: epo-nor features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 350833 num_examples: 3157 - name: validation num_bytes: 112233 num_examples: 1010 download_size: 260333 dataset_size: 463066 - config_name: epo-oci features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 61803 num_examples: 587 download_size: 39875 dataset_size: 61803 - config_name: epo-pol features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 220838 num_examples: 2512 - name: validation num_bytes: 508444 num_examples: 5785 download_size: 404321 dataset_size: 729282 - config_name: epo-ron features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 325498 num_examples: 3819 - name: validation num_bytes: 111685 num_examples: 1318 download_size: 227905 dataset_size: 437183 - config_name: epo-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1388231 num_examples: 11673 - name: validation num_bytes: 5467312 num_examples: 45959 download_size: 3085306 dataset_size: 6855543 - config_name: epo-slv features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 26709 num_examples: 301 download_size: 20072 dataset_size: 26709 - config_name: epo-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1022772 num_examples: 10734 - name: validation num_bytes: 6222407 num_examples: 65018 download_size: 3579864 dataset_size: 7245179 - config_name: epo-srp_Cyrl features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 224873 num_examples: 1794 - name: validation num_bytes: 253691 num_examples: 1996 download_size: 243276 dataset_size: 478564 - config_name: epo-srp_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 24078 num_examples: 269 - name: validation num_bytes: 24818 num_examples: 277 download_size: 29793 dataset_size: 48896 - config_name: epo-swe features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 140701 num_examples: 1756 - name: validation num_bytes: 6011 num_examples: 67 download_size: 78411 dataset_size: 146712 - config_name: epo-tgl features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 115164 num_examples: 1108 download_size: 64569 dataset_size: 115164 - config_name: epo-tlh features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 150783 num_examples: 1929 - name: validation num_bytes: 11493 num_examples: 163 download_size: 81951 dataset_size: 162276 - config_name: epo-toki features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 267658 num_examples: 2732 - name: validation num_bytes: 126955 num_examples: 1294 download_size: 175555 dataset_size: 394613 - config_name: epo-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 419480 num_examples: 4999 - name: validation num_bytes: 643726 num_examples: 7649 download_size: 537663 dataset_size: 1063206 - config_name: epo-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 231908 num_examples: 2499 - name: validation num_bytes: 423910 num_examples: 4558 download_size: 308499 dataset_size: 655818 - config_name: epo-vie features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 194820 num_examples: 1789 download_size: 106504 dataset_size: 194820 - config_name: epo-vol features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 64771 num_examples: 818 - name: validation num_bytes: 2690 num_examples: 36 download_size: 35300 dataset_size: 67461 - config_name: epo-yid features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 106126 num_examples: 992 - name: validation num_bytes: 221020 num_examples: 2023 download_size: 120758 dataset_size: 327146 - config_name: epo-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 199388 num_examples: 2240 - name: validation num_bytes: 90049 num_examples: 1013 download_size: 160108 dataset_size: 289437 - config_name: est-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 60148 num_examples: 612 download_size: 32065 dataset_size: 60148 - config_name: eus-jpn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 17497 num_examples: 236 download_size: 10476 dataset_size: 17497 - config_name: eus-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 41158 num_examples: 484 download_size: 23219 dataset_size: 41158 - config_name: eus-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 156461 num_examples: 1849 - name: validation num_bytes: 85575 num_examples: 1002 download_size: 142363 dataset_size: 242036 - config_name: fas-fra features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 49856 num_examples: 375 download_size: 29339 dataset_size: 49856 - config_name: fin-fin features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 78494 num_examples: 999 - name: validation num_bytes: 28304 num_examples: 348 download_size: 58170 dataset_size: 106798 - config_name: fin-fkv features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 36942 num_examples: 388 - name: validation num_bytes: 6025 num_examples: 67 download_size: 26460 dataset_size: 42967 - config_name: fin-fra features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 154714 num_examples: 1919 - name: validation num_bytes: 84473 num_examples: 1055 download_size: 118989 dataset_size: 239187 - config_name: fin-heb features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 22299 num_examples: 211 download_size: 14503 dataset_size: 22299 - config_name: fin-hun features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 96479 num_examples: 1296 download_size: 50162 dataset_size: 96479 - config_name: fin-ita features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 85183 num_examples: 1038 - name: validation num_bytes: 81712 num_examples: 1002 download_size: 91368 dataset_size: 166895 - config_name: fin-jpn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1057712 num_examples: 9933 - name: validation num_bytes: 1169077 num_examples: 10916 download_size: 981892 dataset_size: 2226789 - config_name: fin-jpn_Hani features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 86086 num_examples: 975 - name: validation num_bytes: 103970 num_examples: 1107 download_size: 94922 dataset_size: 190056 - config_name: fin-jpn_Hira features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 946165 num_examples: 8714 - name: validation num_bytes: 1026133 num_examples: 9437 download_size: 861481 dataset_size: 1972298 - config_name: fin-jpn_Kana features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 23503 num_examples: 225 - name: validation num_bytes: 35948 num_examples: 341 download_size: 33085 dataset_size: 59451 - config_name: fin-kor features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 38867 num_examples: 421 download_size: 19418 dataset_size: 38867 - config_name: fin-kor_Hang features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 37733 num_examples: 410 download_size: 18965 dataset_size: 37733 - config_name: fin-kur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 23726 num_examples: 258 download_size: 16435 dataset_size: 23726 - config_name: fin-lat features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 22133 num_examples: 293 download_size: 13259 dataset_size: 22133 - config_name: fin-nld features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 25055 num_examples: 316 download_size: 16414 dataset_size: 25055 - config_name: fin-nno features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 60606 num_examples: 815 - name: validation num_bytes: 30940 num_examples: 400 download_size: 41255 dataset_size: 91546 - config_name: fin-nob features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 129727 num_examples: 1671 - name: validation num_bytes: 58412 num_examples: 746 download_size: 79422 dataset_size: 188139 - config_name: fin-nor features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 190395 num_examples: 2487 - name: validation num_bytes: 89440 num_examples: 1147 download_size: 118809 dataset_size: 279835 - config_name: fin-pol features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 50430 num_examples: 608 download_size: 30459 dataset_size: 50430 - config_name: fin-por features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 37567 num_examples: 476 download_size: 23384 dataset_size: 37567 - config_name: fin-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 367235 num_examples: 3642 - name: validation num_bytes: 102901 num_examples: 1008 download_size: 226112 dataset_size: 470136 - config_name: fin-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 216404 num_examples: 2512 - name: validation num_bytes: 639283 num_examples: 7398 download_size: 451519 dataset_size: 855687 - config_name: fin-swe features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 242995 num_examples: 2840 - name: validation num_bytes: 610373 num_examples: 7283 download_size: 434244 dataset_size: 853368 - config_name: fin-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 161803 num_examples: 1747 download_size: 100428 dataset_size: 161803 - config_name: fin-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 29525 num_examples: 381 download_size: 16206 dataset_size: 29525 - config_name: fra-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 714693 num_examples: 6824 - name: validation num_bytes: 539737 num_examples: 5146 download_size: 669458 dataset_size: 1254430 - config_name: fra-cmn_Hant features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 200372 num_examples: 2131 - name: validation num_bytes: 159759 num_examples: 1716 download_size: 195175 dataset_size: 360131 - config_name: fra-fra features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 95853 num_examples: 999 - name: validation num_bytes: 200167 num_examples: 2117 download_size: 162861 dataset_size: 296020 - config_name: fra-gcf features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 83594 num_examples: 1163 download_size: 36863 dataset_size: 83594 - config_name: fra-hbs features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 43454 num_examples: 473 download_size: 25778 dataset_size: 43454 - config_name: fra-heb features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 351140 num_examples: 3280 - name: validation num_bytes: 112878 num_examples: 1063 download_size: 224259 dataset_size: 464018 - config_name: fra-hrv features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 22475 num_examples: 257 download_size: 16028 dataset_size: 22475 - config_name: fra-hun features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 206436 num_examples: 2493 - name: validation num_bytes: 403973 num_examples: 4847 download_size: 325750 dataset_size: 610409 - config_name: fra-ido features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 24221 num_examples: 295 - name: validation num_bytes: 1966 num_examples: 25 download_size: 18277 dataset_size: 26187 - config_name: fra-ile features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 31510 num_examples: 393 download_size: 18670 dataset_size: 31510 - config_name: fra-ina features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 119331 num_examples: 1176 - name: validation num_bytes: 6770 num_examples: 84 download_size: 70202 dataset_size: 126101 - config_name: fra-ind features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 85936 num_examples: 904 download_size: 42819 dataset_size: 85936 - config_name: fra-ita features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 839485 num_examples: 10090 - name: validation num_bytes: 5720450 num_examples: 68867 download_size: 2551887 dataset_size: 6559935 - config_name: fra-jbo features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 107711 num_examples: 1113 download_size: 54579 dataset_size: 107711 - config_name: fra-jpn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1224719 num_examples: 10168 - name: validation num_bytes: 3438636 num_examples: 28593 download_size: 2222126 dataset_size: 4663355 - config_name: fra-jpn_Hani features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 130113 num_examples: 1178 - name: validation num_bytes: 364029 num_examples: 3263 download_size: 258314 dataset_size: 494142 - config_name: fra-jpn_Hira features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1070788 num_examples: 8790 - name: validation num_bytes: 3017108 num_examples: 24841 download_size: 1926735 dataset_size: 4087896 - config_name: fra-kab features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1199219 num_examples: 12345 - name: validation num_bytes: 1731934 num_examples: 18362 download_size: 1573483 dataset_size: 2931153 - config_name: fra-kor features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 35629 num_examples: 320 download_size: 21764 dataset_size: 35629 - config_name: fra-kor_Hang features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 35253 num_examples: 315 download_size: 21576 dataset_size: 35253 - config_name: fra-lat features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 398183 num_examples: 2914 - name: validation num_bytes: 150151 num_examples: 1132 download_size: 325398 dataset_size: 548334 - config_name: fra-lfn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 61775 num_examples: 522 - name: validation num_bytes: 17732 num_examples: 120 download_size: 42790 dataset_size: 79507 - config_name: fra-lfn_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 36842 num_examples: 360 - name: validation num_bytes: 7190 num_examples: 59 download_size: 29938 dataset_size: 44032 - config_name: fra-msa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 97273 num_examples: 1002 download_size: 49651 dataset_size: 97273 - config_name: fra-nds features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 69733 num_examples: 856 download_size: 37545 dataset_size: 69733 - config_name: fra-nld features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1032379 num_examples: 11547 - name: validation num_bytes: 1434058 num_examples: 16734 download_size: 1225080 dataset_size: 2466437 - config_name: fra-nob features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 28098 num_examples: 322 download_size: 18839 dataset_size: 28098 - config_name: fra-nor features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 39314 num_examples: 476 download_size: 23735 dataset_size: 39314 - config_name: fra-oci features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 77697 num_examples: 805 download_size: 47306 dataset_size: 77697 - config_name: fra-pcd features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 21120 num_examples: 265 download_size: 14284 dataset_size: 21120 - config_name: fra-pol features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 292987 num_examples: 3086 - name: validation num_bytes: 97700 num_examples: 1004 download_size: 227030 dataset_size: 390687 - config_name: fra-por features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 984313 num_examples: 10517 - name: validation num_bytes: 1557910 num_examples: 17061 download_size: 1297547 dataset_size: 2542223 - config_name: fra-ron features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 163669 num_examples: 1924 download_size: 86874 dataset_size: 163669 - config_name: fra-run features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 87929 num_examples: 1273 download_size: 39989 dataset_size: 87929 - config_name: fra-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1211147 num_examples: 11489 - name: validation num_bytes: 19406105 num_examples: 183049 download_size: 7524789 dataset_size: 20617252 - config_name: fra-slv features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 43487 num_examples: 447 download_size: 30108 dataset_size: 43487 - config_name: fra-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1006273 num_examples: 10282 - name: validation num_bytes: 3969459 num_examples: 40594 download_size: 2523610 dataset_size: 4975732 - config_name: fra-swe features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 118006 num_examples: 1406 - name: validation num_bytes: 93025 num_examples: 1126 download_size: 116631 dataset_size: 211031 - config_name: fra-tat features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 29358 num_examples: 304 download_size: 17031 dataset_size: 29358 - config_name: fra-tgl features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 79571 num_examples: 835 download_size: 42990 dataset_size: 79571 - config_name: fra-tlh features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 48938 num_examples: 619 download_size: 26025 dataset_size: 48938 - config_name: fra-tlh_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 48870 num_examples: 618 download_size: 26013 dataset_size: 48870 - config_name: fra-toki features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 54839 num_examples: 561 - name: validation num_bytes: 3517 num_examples: 25 download_size: 31300 dataset_size: 58356 - config_name: fra-toki_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 54741 num_examples: 560 download_size: 26716 dataset_size: 54741 - config_name: fra-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 239792 num_examples: 2581 - name: validation num_bytes: 623793 num_examples: 6717 download_size: 470964 dataset_size: 863585 - config_name: fra-uig features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 75067 num_examples: 693 download_size: 35562 dataset_size: 75067 - config_name: fra-uig_Arab features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 74877 num_examples: 692 download_size: 35414 dataset_size: 74877 - config_name: fra-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 956739 num_examples: 10034 - name: validation num_bytes: 1740948 num_examples: 18251 download_size: 1168887 dataset_size: 2697687 - config_name: fra-vie features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 148176 num_examples: 1026 download_size: 84386 dataset_size: 148176 - config_name: fra-wuu features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 74665 num_examples: 744 - name: validation num_bytes: 47382 num_examples: 493 download_size: 74331 dataset_size: 122047 - config_name: fra-yid features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 40020 num_examples: 383 - name: validation num_bytes: 6722 num_examples: 60 download_size: 24894 dataset_size: 46742 - config_name: fra-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1016230 num_examples: 9993 - name: validation num_bytes: 765427 num_examples: 7557 download_size: 954496 dataset_size: 1781657 - config_name: fry-nld features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 21584 num_examples: 259 download_size: 15175 dataset_size: 21584 - config_name: gcf-gcf features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 17072 num_examples: 232 download_size: 8382 dataset_size: 17072 - config_name: gla-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 22545 num_examples: 288 download_size: 13378 dataset_size: 22545 - config_name: glg-por features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 38555 num_examples: 432 download_size: 25288 dataset_size: 38555 - config_name: glg-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 209239 num_examples: 2120 - name: validation num_bytes: 98399 num_examples: 1011 download_size: 188047 dataset_size: 307638 - config_name: gos-nld features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 122569 num_examples: 1851 - name: validation num_bytes: 31135 num_examples: 426 download_size: 80741 dataset_size: 153704 - config_name: grn-por features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 20197 num_examples: 223 - name: validation num_bytes: 6613 num_examples: 73 download_size: 20489 dataset_size: 26810 - config_name: grn-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 53426 num_examples: 658 - name: validation num_bytes: 12618 num_examples: 129 download_size: 41100 dataset_size: 66044 - config_name: hbs-ita features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 43433 num_examples: 533 download_size: 24080 dataset_size: 43433 - config_name: hbs-jpn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 40363 num_examples: 399 download_size: 23550 dataset_size: 40363 - config_name: hbs-nor features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 541133 num_examples: 4999 - name: validation num_bytes: 648554 num_examples: 6148 download_size: 656217 dataset_size: 1189687 - config_name: hbs-pol features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 38040 num_examples: 416 download_size: 24618 dataset_size: 38040 - config_name: hbs-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 266963 num_examples: 2499 - name: validation num_bytes: 447457 num_examples: 4176 download_size: 340887 dataset_size: 714420 - config_name: hbs-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 50110 num_examples: 606 download_size: 27900 dataset_size: 50110 - config_name: hbs-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 87516 num_examples: 941 download_size: 43358 dataset_size: 87516 - config_name: hbs-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 20146 num_examples: 235 download_size: 11983 dataset_size: 20146 - config_name: heb-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 26063 num_examples: 284 download_size: 14508 dataset_size: 26063 - config_name: heb-cmn_Hant features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 27168 num_examples: 328 download_size: 13810 dataset_size: 27168 - config_name: heb-heb features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 108260 num_examples: 999 - name: validation num_bytes: 80048 num_examples: 731 download_size: 92997 dataset_size: 188308 - config_name: heb-hun features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 37276 num_examples: 416 download_size: 21085 dataset_size: 37276 - config_name: heb-ina features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 21868 num_examples: 216 - name: validation num_bytes: 3739 num_examples: 40 download_size: 14968 dataset_size: 25607 - config_name: heb-ita features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 163344 num_examples: 1705 - name: validation num_bytes: 2500 num_examples: 29 download_size: 79221 dataset_size: 165844 - config_name: heb-jpn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 30285 num_examples: 240 download_size: 17375 dataset_size: 30285 - config_name: heb-jpn_Hira features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 25879 num_examples: 200 download_size: 15647 dataset_size: 25879 - config_name: heb-lad features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 21415 num_examples: 217 - name: validation num_bytes: 3554 num_examples: 40 download_size: 12598 dataset_size: 24969 - config_name: heb-lat features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 26822 num_examples: 270 - name: validation num_bytes: 5540 num_examples: 57 download_size: 19894 dataset_size: 32362 - config_name: heb-lfn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 27303 num_examples: 279 - name: validation num_bytes: 7178 num_examples: 71 download_size: 18710 dataset_size: 34481 - config_name: heb-lfn_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 19775 num_examples: 211 - name: validation num_bytes: 3816 num_examples: 40 download_size: 15274 dataset_size: 23591 - config_name: heb-nld features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 266639 num_examples: 2499 - name: validation num_bytes: 468326 num_examples: 4356 download_size: 386648 dataset_size: 734965 - config_name: heb-pol features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 497202 num_examples: 4999 - name: validation num_bytes: 774983 num_examples: 7756 download_size: 643636 dataset_size: 1272185 - config_name: heb-por features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 72677 num_examples: 718 - name: validation num_bytes: 1747 num_examples: 23 download_size: 37307 dataset_size: 74424 - config_name: heb-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 302724 num_examples: 2499 - name: validation num_bytes: 428629 num_examples: 3567 download_size: 338004 dataset_size: 731353 - config_name: heb-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 196688 num_examples: 1848 - name: validation num_bytes: 111724 num_examples: 1076 download_size: 156609 dataset_size: 308412 - config_name: heb-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 121159 num_examples: 1376 - name: validation num_bytes: 2254 num_examples: 29 download_size: 53530 dataset_size: 123413 - config_name: heb-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 94997 num_examples: 965 download_size: 42585 dataset_size: 94997 - config_name: heb-yid features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 109288 num_examples: 900 - name: validation num_bytes: 23281 num_examples: 195 download_size: 55057 dataset_size: 132569 - config_name: heb-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 60713 num_examples: 708 download_size: 28884 dataset_size: 60713 - config_name: hin-urd features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 36470 num_examples: 239 download_size: 18339 dataset_size: 36470 - config_name: hin-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 48351 num_examples: 323 download_size: 23182 dataset_size: 48351 - config_name: hrv-jpn_Hira features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 28149 num_examples: 275 download_size: 17536 dataset_size: 28149 - config_name: hrv-pol features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 22635 num_examples: 270 download_size: 16347 dataset_size: 22635 - config_name: hrv-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 19896 num_examples: 253 download_size: 14172 dataset_size: 19896 - config_name: hrv-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 36838 num_examples: 388 download_size: 22903 dataset_size: 36838 - config_name: hsb-slv features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 44372 num_examples: 742 download_size: 20741 dataset_size: 44372 - config_name: hun-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 22749 num_examples: 246 download_size: 15718 dataset_size: 22749 - config_name: hun-hun features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 97219 num_examples: 1063 download_size: 60723 dataset_size: 97219 - config_name: hun-ita features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 389121 num_examples: 4999 - name: validation num_bytes: 554593 num_examples: 7158 download_size: 455662 dataset_size: 943714 - config_name: hun-jpn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 311959 num_examples: 2498 - name: validation num_bytes: 453496 num_examples: 3659 download_size: 429779 dataset_size: 765455 - config_name: hun-jpn_Hani features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 41402 num_examples: 350 - name: validation num_bytes: 68853 num_examples: 578 download_size: 69906 dataset_size: 110255 - config_name: hun-jpn_Hira features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 267931 num_examples: 2126 - name: validation num_bytes: 379193 num_examples: 3035 download_size: 361985 dataset_size: 647124 - config_name: hun-kor features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 25888 num_examples: 270 download_size: 15849 dataset_size: 25888 - config_name: hun-kor_Hang features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 25608 num_examples: 267 download_size: 15747 dataset_size: 25608 - config_name: hun-lat features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 50223 num_examples: 606 download_size: 28512 dataset_size: 50223 - config_name: hun-nld features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 133308 num_examples: 1628 - name: validation num_bytes: 2404 num_examples: 34 download_size: 76656 dataset_size: 135712 - config_name: hun-pol features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 163206 num_examples: 1933 download_size: 93818 dataset_size: 163206 - config_name: hun-por features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 205901 num_examples: 2499 - name: validation num_bytes: 290673 num_examples: 3516 download_size: 280795 dataset_size: 496574 - config_name: hun-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 284982 num_examples: 2686 - name: validation num_bytes: 638282 num_examples: 6097 download_size: 459837 dataset_size: 923264 - config_name: hun-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 209578 num_examples: 2499 - name: validation num_bytes: 353678 num_examples: 4193 download_size: 318984 dataset_size: 563256 - config_name: hun-swe features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 137226 num_examples: 1613 - name: validation num_bytes: 42072 num_examples: 524 download_size: 100142 dataset_size: 179298 - config_name: hun-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 75382 num_examples: 1001 download_size: 43401 dataset_size: 75382 - config_name: hun-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 43349 num_examples: 472 download_size: 24776 dataset_size: 43349 - config_name: hun-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 38349 num_examples: 433 download_size: 24587 dataset_size: 38349 - config_name: hye-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 32430 num_examples: 226 download_size: 19500 dataset_size: 32430 - config_name: ido-ina features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 24488 num_examples: 315 - name: validation num_bytes: 4132 num_examples: 51 download_size: 15761 dataset_size: 28620 - config_name: ido-ita features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 131629 num_examples: 1459 - name: validation num_bytes: 2824 num_examples: 38 download_size: 75968 dataset_size: 134453 - config_name: ido-lfn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 27869 num_examples: 347 - name: validation num_bytes: 4414 num_examples: 54 download_size: 17583 dataset_size: 32283 - config_name: ido-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 42932 num_examples: 498 download_size: 24884 dataset_size: 42932 - config_name: ido-yid features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 42711 num_examples: 425 - name: validation num_bytes: 6060 num_examples: 57 download_size: 22532 dataset_size: 48771 - config_name: ido_Latn-lfn_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 23808 num_examples: 308 - name: validation num_bytes: 2844 num_examples: 37 download_size: 15374 dataset_size: 26652 - config_name: ina-ita features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 90877 num_examples: 911 download_size: 47258 dataset_size: 90877 - config_name: ina-lad features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 27197 num_examples: 319 - name: validation num_bytes: 3814 num_examples: 45 download_size: 15003 dataset_size: 31011 - config_name: ina-lat features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 83249 num_examples: 911 - name: validation num_bytes: 5396 num_examples: 69 download_size: 46162 dataset_size: 88645 - config_name: ina-lfn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 65050 num_examples: 768 - name: validation num_bytes: 10419 num_examples: 121 download_size: 32655 dataset_size: 75469 - config_name: ina-nld features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 103264 num_examples: 1032 download_size: 57615 dataset_size: 103264 - config_name: ina-por features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 305959 num_examples: 2536 - name: validation num_bytes: 868638 num_examples: 7079 download_size: 630752 dataset_size: 1174597 - config_name: ina-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 193271 num_examples: 1452 - name: validation num_bytes: 3159 num_examples: 38 download_size: 104351 dataset_size: 196430 - config_name: ina-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 184789 num_examples: 1612 - name: validation num_bytes: 9064 num_examples: 108 download_size: 112003 dataset_size: 193853 - config_name: ina-tlh features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 14888 num_examples: 215 - name: validation num_bytes: 2395 num_examples: 33 download_size: 10654 dataset_size: 17283 - config_name: ina-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 24080 num_examples: 316 download_size: 12943 dataset_size: 24080 - config_name: ina-yid features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 82845 num_examples: 787 - name: validation num_bytes: 13703 num_examples: 119 download_size: 37542 dataset_size: 96548 - config_name: ina_Latn-lad_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 18914 num_examples: 231 - name: validation num_bytes: 2130 num_examples: 28 download_size: 12137 dataset_size: 21044 - config_name: ina_Latn-lfn_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 56715 num_examples: 691 - name: validation num_bytes: 7857 num_examples: 95 download_size: 29436 dataset_size: 64572 - config_name: ina_Latn-tlh_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 14835 num_examples: 214 download_size: 7397 dataset_size: 14835 - config_name: ind-zsm_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 25864 num_examples: 224 download_size: 17176 dataset_size: 25864 - config_name: isl-ita features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 18908 num_examples: 235 download_size: 13096 dataset_size: 18908 - config_name: isl-jpn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 24848 num_examples: 250 download_size: 14856 dataset_size: 24848 - config_name: isl-jpn_Hira features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 20683 num_examples: 210 download_size: 12763 dataset_size: 20683 - config_name: isl-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 16580 num_examples: 237 download_size: 11275 dataset_size: 16580 - config_name: ita-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 160755 num_examples: 1826 - name: validation num_bytes: 51932 num_examples: 580 download_size: 109218 dataset_size: 212687 - config_name: ita-cmn_Hant features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 84647 num_examples: 1036 - name: validation num_bytes: 33399 num_examples: 397 download_size: 62080 dataset_size: 118046 - config_name: ita-ind features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 31398 num_examples: 367 download_size: 17766 dataset_size: 31398 - config_name: ita-ita features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 91878 num_examples: 999 - name: validation num_bytes: 31945 num_examples: 349 download_size: 68215 dataset_size: 123823 - config_name: ita-jpn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 305457 num_examples: 2659 - name: validation num_bytes: 118091 num_examples: 1004 download_size: 225897 dataset_size: 423548 - config_name: ita-jpn_Hani features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 30518 num_examples: 287 - name: validation num_bytes: 11399 num_examples: 104 download_size: 29023 dataset_size: 41917 - config_name: ita-jpn_Hira features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 269258 num_examples: 2325 - name: validation num_bytes: 103941 num_examples: 879 download_size: 197961 dataset_size: 373199 - config_name: ita-lat features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 153385 num_examples: 1715 - name: validation num_bytes: 1501 num_examples: 20 download_size: 85739 dataset_size: 154886 - config_name: ita-lit features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 19131 num_examples: 223 download_size: 14660 dataset_size: 19131 - config_name: ita-msa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 37328 num_examples: 425 download_size: 21126 dataset_size: 37328 - config_name: ita-nds features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 25021 num_examples: 312 download_size: 16148 dataset_size: 25021 - config_name: ita-nld features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 203972 num_examples: 2577 - name: validation num_bytes: 552453 num_examples: 6939 download_size: 380577 dataset_size: 756425 - config_name: ita-nor features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 113673 num_examples: 938 download_size: 64121 dataset_size: 113673 - config_name: ita-pms features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 21573 num_examples: 231 download_size: 15301 dataset_size: 21573 - config_name: ita-pol features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 109444 num_examples: 1293 - name: validation num_bytes: 85500 num_examples: 1001 download_size: 118611 dataset_size: 194944 - config_name: ita-por features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 310707 num_examples: 3065 - name: validation num_bytes: 590134 num_examples: 6370 download_size: 485946 dataset_size: 900841 - config_name: ita-ron features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 80875 num_examples: 1004 download_size: 43545 dataset_size: 80875 - config_name: ita-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1114704 num_examples: 10044 - name: validation num_bytes: 7351576 num_examples: 66132 download_size: 3655171 dataset_size: 8466280 - config_name: ita-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 441378 num_examples: 4980 - name: validation num_bytes: 816444 num_examples: 9246 download_size: 682089 dataset_size: 1257822 - config_name: ita-swe features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 59844 num_examples: 714 download_size: 34751 dataset_size: 59844 - config_name: ita-toki features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 19371 num_examples: 199 download_size: 11797 dataset_size: 19371 - config_name: ita-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 998370 num_examples: 9999 - name: validation num_bytes: 564058 num_examples: 5702 download_size: 466198 dataset_size: 1562428 - config_name: ita-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 433759 num_examples: 4999 - name: validation num_bytes: 760632 num_examples: 8774 download_size: 509763 dataset_size: 1194391 - config_name: ita-vie features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 24470 num_examples: 249 download_size: 16510 dataset_size: 24470 - config_name: ita-yid features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 19240 num_examples: 205 - name: validation num_bytes: 3948 num_examples: 36 download_size: 14049 dataset_size: 23188 - config_name: ita-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 252327 num_examples: 2942 - name: validation num_bytes: 87841 num_examples: 1003 download_size: 171973 dataset_size: 340168 - config_name: jbo-jpn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 92236 num_examples: 920 download_size: 42342 dataset_size: 92236 - config_name: jbo-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 123960 num_examples: 1198 download_size: 60175 dataset_size: 123960 - config_name: jbo-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 125512 num_examples: 1506 download_size: 61472 dataset_size: 125512 - config_name: jbo-swe features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 17865 num_examples: 241 download_size: 11688 dataset_size: 17865 - config_name: jbo-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 42406 num_examples: 517 - name: validation num_bytes: 1870 num_examples: 22 download_size: 23837 dataset_size: 44276 - config_name: jbo_Latn-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 23379 num_examples: 280 download_size: 13137 dataset_size: 23379 - config_name: jbo_Latn-cmn_Hant features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 18379 num_examples: 230 - name: validation num_bytes: 1702 num_examples: 20 download_size: 13358 dataset_size: 20081 - config_name: jbo_Latn-jpn_Hira features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 82695 num_examples: 825 download_size: 38109 dataset_size: 82695 - config_name: jpn-jpn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 81644 num_examples: 583 download_size: 42091 dataset_size: 81644 - config_name: jpn-kor features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 71655 num_examples: 621 download_size: 37810 dataset_size: 71655 - config_name: jpn-lit features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 24465 num_examples: 245 download_size: 15257 dataset_size: 24465 - config_name: jpn-mar features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 47030 num_examples: 339 download_size: 19843 dataset_size: 47030 - config_name: jpn-msa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 274231 num_examples: 2615 - name: validation num_bytes: 110792 num_examples: 1059 download_size: 192103 dataset_size: 385023 - config_name: jpn-nds features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 39337 num_examples: 405 - name: validation num_bytes: 5535 num_examples: 52 download_size: 25417 dataset_size: 44872 - config_name: jpn-nld features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 352873 num_examples: 3429 - name: validation num_bytes: 108309 num_examples: 1056 download_size: 229719 dataset_size: 461182 - config_name: jpn-nor features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 117048 num_examples: 1046 download_size: 63350 dataset_size: 117048 - config_name: jpn-pol features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1164653 num_examples: 9998 - name: validation num_bytes: 1730980 num_examples: 14808 download_size: 1512836 dataset_size: 2895633 - config_name: jpn-por features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 216426 num_examples: 1939 - name: validation num_bytes: 127225 num_examples: 1132 download_size: 185258 dataset_size: 343651 - config_name: jpn-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1380936 num_examples: 10141 - name: validation num_bytes: 2134502 num_examples: 15674 download_size: 1640265 dataset_size: 3515438 - config_name: jpn-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1194028 num_examples: 10741 - name: validation num_bytes: 2614778 num_examples: 23419 download_size: 1846070 dataset_size: 3808806 - config_name: jpn-swe features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 93655 num_examples: 895 download_size: 47852 dataset_size: 93655 - config_name: jpn-tlh features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 73820 num_examples: 673 download_size: 36035 dataset_size: 73820 - config_name: jpn-toki features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 23708 num_examples: 241 download_size: 11441 dataset_size: 23708 - config_name: jpn-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 88452 num_examples: 810 download_size: 48739 dataset_size: 88452 - config_name: jpn-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 54028 num_examples: 456 - name: validation num_bytes: 2151 num_examples: 20 download_size: 31854 dataset_size: 56179 - config_name: jpn-vie features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 135321 num_examples: 1054 - name: validation num_bytes: 76557 num_examples: 584 download_size: 103531 dataset_size: 211878 - config_name: jpn-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 274897 num_examples: 2496 - name: validation num_bytes: 318480 num_examples: 2893 download_size: 302048 dataset_size: 593377 - config_name: jpn_Hani-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 21359 num_examples: 215 - name: validation num_bytes: 22317 num_examples: 228 download_size: 27964 dataset_size: 43676 - config_name: jpn_Hani-nld features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 40240 num_examples: 425 - name: validation num_bytes: 10865 num_examples: 118 download_size: 31212 dataset_size: 51105 - config_name: jpn_Hani-pol features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 117055 num_examples: 1080 - name: validation num_bytes: 169999 num_examples: 1570 download_size: 162734 dataset_size: 287054 - config_name: jpn_Hani-por features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 34559 num_examples: 335 - name: validation num_bytes: 16346 num_examples: 159 download_size: 34122 dataset_size: 50905 - config_name: jpn_Hani-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 110549 num_examples: 823 - name: validation num_bytes: 178500 num_examples: 1325 download_size: 147019 dataset_size: 289049 - config_name: jpn_Hani-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 115105 num_examples: 1098 - name: validation num_bytes: 255898 num_examples: 2464 download_size: 192880 dataset_size: 371003 - config_name: jpn_Hira-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 139892 num_examples: 1192 - name: validation num_bytes: 165983 num_examples: 1439 download_size: 159085 dataset_size: 305875 - config_name: jpn_Hira-cmn_Hant features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 73955 num_examples: 681 - name: validation num_bytes: 84628 num_examples: 778 download_size: 81587 dataset_size: 158583 - config_name: jpn_Hira-ind features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 236214 num_examples: 2255 - name: validation num_bytes: 91241 num_examples: 873 download_size: 161763 dataset_size: 327455 - config_name: jpn_Hira-jpn_Hira features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 65397 num_examples: 440 download_size: 34178 dataset_size: 65397 - config_name: jpn_Hira-kor_Hang features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 59682 num_examples: 511 download_size: 32220 dataset_size: 59682 - config_name: jpn_Hira-lit features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 20604 num_examples: 204 download_size: 13134 dataset_size: 20604 - config_name: jpn_Hira-mar features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 34541 num_examples: 253 download_size: 14813 dataset_size: 34541 - config_name: jpn_Hira-nds features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 33481 num_examples: 346 - name: validation num_bytes: 5212 num_examples: 49 download_size: 22969 dataset_size: 38693 - config_name: jpn_Hira-nld features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 304533 num_examples: 2924 - name: validation num_bytes: 93200 num_examples: 894 download_size: 195182 dataset_size: 397733 - config_name: jpn_Hira-nob features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 93077 num_examples: 804 download_size: 49978 dataset_size: 93077 - config_name: jpn_Hira-pol features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1029980 num_examples: 8764 - name: validation num_bytes: 1534624 num_examples: 13009 download_size: 1330532 dataset_size: 2564604 - config_name: jpn_Hira-por features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 175561 num_examples: 1539 - name: validation num_bytes: 103935 num_examples: 907 download_size: 149727 dataset_size: 279496 - config_name: jpn_Hira-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1226928 num_examples: 9007 - name: validation num_bytes: 1881431 num_examples: 13786 download_size: 1439873 dataset_size: 3108359 - config_name: jpn_Hira-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1050339 num_examples: 9378 - name: validation num_bytes: 2307088 num_examples: 20504 download_size: 1615553 dataset_size: 3357427 - config_name: jpn_Hira-swe features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 79203 num_examples: 728 download_size: 40909 dataset_size: 79203 - config_name: jpn_Hira-tlh_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 70052 num_examples: 637 download_size: 34176 dataset_size: 70052 - config_name: jpn_Hira-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 80294 num_examples: 731 download_size: 44455 dataset_size: 80294 - config_name: jpn_Hira-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 46097 num_examples: 382 download_size: 24917 dataset_size: 46097 - config_name: jpn_Hira-vie features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 117705 num_examples: 885 - name: validation num_bytes: 65151 num_examples: 502 download_size: 88516 dataset_size: 182856 - config_name: jpn_Kana-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 41427 num_examples: 299 - name: validation num_bytes: 71221 num_examples: 536 download_size: 60105 dataset_size: 112648 - config_name: jpn_Kana-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 27485 num_examples: 253 - name: validation num_bytes: 50396 num_examples: 438 download_size: 45648 dataset_size: 77881 - config_name: kab-kab features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 104613 num_examples: 992 - name: validation num_bytes: 28067 num_examples: 308 download_size: 55056 dataset_size: 132680 - config_name: kab-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 44470 num_examples: 422 download_size: 21560 dataset_size: 44470 - config_name: kab-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 81054 num_examples: 882 download_size: 45672 dataset_size: 81054 - config_name: kat-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 99104 num_examples: 641 download_size: 40773 dataset_size: 99104 - config_name: kaz-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 354063 num_examples: 1399 - name: validation num_bytes: 253814 num_examples: 1015 download_size: 307583 dataset_size: 607877 - config_name: kaz_Cyrl-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 353875 num_examples: 1397 download_size: 178055 dataset_size: 353875 - config_name: khm-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 164312 num_examples: 1447 download_size: 67906 dataset_size: 164312 - config_name: kor-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 25610 num_examples: 220 download_size: 15434 dataset_size: 25610 - config_name: kor-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 94260 num_examples: 939 download_size: 51021 dataset_size: 94260 - config_name: kor-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 41990 num_examples: 410 download_size: 24333 dataset_size: 41990 - config_name: kor_Hang-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 21351 num_examples: 206 download_size: 14006 dataset_size: 21351 - config_name: kor_Hang-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 25515 num_examples: 219 download_size: 15362 dataset_size: 25515 - config_name: kor_Hang-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 92835 num_examples: 920 download_size: 50406 dataset_size: 92835 - config_name: kzj-msa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 50098 num_examples: 369 download_size: 28983 dataset_size: 50098 - config_name: kzj_Latn-zsm_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 47082 num_examples: 346 download_size: 27670 dataset_size: 47082 - config_name: lad-lat features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 17154 num_examples: 213 - name: validation num_bytes: 2587 num_examples: 32 download_size: 11583 dataset_size: 19741 - config_name: lad-lfn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 27893 num_examples: 334 - name: validation num_bytes: 5940 num_examples: 67 download_size: 15923 dataset_size: 33833 - config_name: lad-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 20488 num_examples: 275 - name: validation num_bytes: 2172 num_examples: 24 download_size: 14475 dataset_size: 22660 - config_name: lad-yid features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 62268 num_examples: 603 - name: validation num_bytes: 9505 num_examples: 83 download_size: 25397 dataset_size: 71773 - config_name: lad_Latn-lfn_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 15873 num_examples: 210 - name: validation num_bytes: 2184 num_examples: 28 download_size: 11464 dataset_size: 18057 - config_name: lad_Latn-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 17010 num_examples: 238 download_size: 9753 dataset_size: 17010 - config_name: lad_Latn-yid features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 43136 num_examples: 437 - name: validation num_bytes: 4887 num_examples: 48 download_size: 20509 dataset_size: 48023 - config_name: lat-lat features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 22899 num_examples: 234 download_size: 14686 dataset_size: 22899 - config_name: lat-lfn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 26928 num_examples: 347 - name: validation num_bytes: 5876 num_examples: 70 download_size: 17350 dataset_size: 32804 - config_name: lat-nld features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 29211 num_examples: 365 download_size: 17700 dataset_size: 29211 - config_name: lat-nor features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 27542 num_examples: 332 download_size: 17575 dataset_size: 27542 - config_name: lat-pol features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 71935 num_examples: 914 download_size: 37216 dataset_size: 71935 - config_name: lat-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 107946 num_examples: 1040 - name: validation num_bytes: 108118 num_examples: 1034 download_size: 109064 dataset_size: 216064 - config_name: lat-tlh features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 16836 num_examples: 232 - name: validation num_bytes: 2051 num_examples: 29 download_size: 12516 dataset_size: 18887 - config_name: lat-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 34475 num_examples: 381 download_size: 19303 dataset_size: 34475 - config_name: lat-yid features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 46691 num_examples: 457 - name: validation num_bytes: 7502 num_examples: 77 download_size: 23684 dataset_size: 54193 - config_name: lat_Latn-lfn_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 22616 num_examples: 298 - name: validation num_bytes: 3792 num_examples: 49 download_size: 15381 dataset_size: 26408 - config_name: lav-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 25437 num_examples: 273 download_size: 16042 dataset_size: 25437 - config_name: lfn-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 28109 num_examples: 224 - name: validation num_bytes: 3674 num_examples: 25 download_size: 18848 dataset_size: 31783 - config_name: lfn-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 24218 num_examples: 263 - name: validation num_bytes: 5599 num_examples: 45 download_size: 19620 dataset_size: 29817 - config_name: lfn-yid features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 105328 num_examples: 992 - name: validation num_bytes: 27764 num_examples: 243 download_size: 51210 dataset_size: 133092 - config_name: lfn_Cyrl-por features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 30664 num_examples: 219 - name: validation num_bytes: 71458 num_examples: 445 download_size: 54913 dataset_size: 102122 - config_name: lfn_Latn-yid features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 92455 num_examples: 897 - name: validation num_bytes: 20263 num_examples: 189 download_size: 45418 dataset_size: 112718 - config_name: lit-pol features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 147577 num_examples: 1786 download_size: 83686 dataset_size: 147577 - config_name: lit-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 371707 num_examples: 3597 - name: validation num_bytes: 544976 num_examples: 5229 download_size: 455649 dataset_size: 916683 - config_name: lit-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 34497 num_examples: 453 download_size: 21210 dataset_size: 34497 - config_name: lit-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 119992 num_examples: 1471 download_size: 62438 dataset_size: 119992 - config_name: ltz-nld features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 20518 num_examples: 291 download_size: 12009 dataset_size: 20518 - config_name: mkd-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 17481 num_examples: 216 download_size: 11101 dataset_size: 17481 - config_name: msa-msa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 92783 num_examples: 957 download_size: 47311 dataset_size: 92783 - config_name: msa-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 19231 num_examples: 231 download_size: 13270 dataset_size: 19231 - config_name: msa-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 38156 num_examples: 371 download_size: 23906 dataset_size: 38156 - config_name: nds-nld features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 137259 num_examples: 1656 - name: validation num_bytes: 82442 num_examples: 1012 download_size: 123250 dataset_size: 219701 - config_name: nds-por features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 15384 num_examples: 206 download_size: 10929 dataset_size: 15384 - config_name: nds-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 89629 num_examples: 924 - name: validation num_bytes: 3700 num_examples: 35 download_size: 45741 dataset_size: 93329 - config_name: nds-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 70857 num_examples: 922 download_size: 38462 dataset_size: 70857 - config_name: nld-cmn_Hant features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 148501 num_examples: 1512 download_size: 84767 dataset_size: 148501 - config_name: nld-nld features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 86698 num_examples: 999 - name: validation num_bytes: 86091 num_examples: 1030 download_size: 99089 dataset_size: 172789 - config_name: nld-nor features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 16856 num_examples: 202 download_size: 12796 dataset_size: 16856 - config_name: nld-pol features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 101951 num_examples: 1192 - name: validation num_bytes: 85008 num_examples: 1005 download_size: 112425 dataset_size: 186959 - config_name: nld-por features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 215122 num_examples: 2499 - name: validation num_bytes: 423649 num_examples: 4881 download_size: 344564 dataset_size: 638771 - config_name: nld-ron features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 198420 num_examples: 2268 - name: validation num_bytes: 89982 num_examples: 1047 download_size: 164417 dataset_size: 288402 - config_name: nld-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 272894 num_examples: 2549 - name: validation num_bytes: 691407 num_examples: 6525 download_size: 471874 dataset_size: 964301 - config_name: nld-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 936602 num_examples: 10112 - name: validation num_bytes: 1664140 num_examples: 17830 download_size: 1349764 dataset_size: 2600742 - config_name: nld-toki features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 51558 num_examples: 667 download_size: 19156 dataset_size: 51558 - config_name: nld-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 214187 num_examples: 2499 - name: validation num_bytes: 333002 num_examples: 3879 download_size: 298029 dataset_size: 547189 - config_name: nld-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 895653 num_examples: 9999 - name: validation num_bytes: 453307 num_examples: 5064 download_size: 587337 dataset_size: 1348960 - config_name: nld-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 160442 num_examples: 1653 - name: validation num_bytes: 4384 num_examples: 55 download_size: 96258 dataset_size: 164826 - config_name: nno-nob features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 37401 num_examples: 466 download_size: 23237 dataset_size: 37401 - config_name: nob-nno features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 37335 num_examples: 465 download_size: 23226 dataset_size: 37335 - config_name: nob-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 186077 num_examples: 1276 download_size: 95790 dataset_size: 186077 - config_name: nob-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 84792 num_examples: 884 download_size: 47864 dataset_size: 84792 - config_name: nob-swe features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 45252 num_examples: 562 download_size: 26905 dataset_size: 45252 - config_name: nor-nor features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 78814 num_examples: 981 - name: validation num_bytes: 24311 num_examples: 271 download_size: 54419 dataset_size: 103125 - config_name: nor-pol features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 25261 num_examples: 280 download_size: 17457 dataset_size: 25261 - config_name: nor-por features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 49225 num_examples: 480 download_size: 30952 dataset_size: 49225 - config_name: nor-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 186318 num_examples: 1278 download_size: 95761 dataset_size: 186318 - config_name: nor-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 90519 num_examples: 959 download_size: 50523 dataset_size: 90519 - config_name: nor-swe features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 45468 num_examples: 565 download_size: 27042 dataset_size: 45468 - config_name: nor-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 67193 num_examples: 669 download_size: 35949 dataset_size: 67193 - config_name: nor-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 18545 num_examples: 200 download_size: 13226 dataset_size: 18545 - config_name: orv-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 107455 num_examples: 972 download_size: 46404 dataset_size: 107455 - config_name: ota-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 32973 num_examples: 306 - name: validation num_bytes: 2466 num_examples: 23 download_size: 18923 dataset_size: 35439 - config_name: pol-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 34399 num_examples: 367 download_size: 22730 dataset_size: 34399 - config_name: pol-cmn_Hant features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 57432 num_examples: 595 download_size: 35868 dataset_size: 57432 - config_name: pol-por features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 62768 num_examples: 704 download_size: 39871 dataset_size: 62768 - config_name: pol-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 375905 num_examples: 3542 - name: validation num_bytes: 108015 num_examples: 1014 download_size: 256024 dataset_size: 483920 - config_name: pol-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 225734 num_examples: 2543 - name: validation num_bytes: 453838 num_examples: 4997 download_size: 387050 dataset_size: 679572 - config_name: pol-swe features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 116608 num_examples: 1391 download_size: 69004 dataset_size: 116608 - config_name: pol-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 76786 num_examples: 891 download_size: 42992 dataset_size: 76786 - config_name: pol-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 229734 num_examples: 2518 - name: validation num_bytes: 632674 num_examples: 6897 download_size: 413476 dataset_size: 862408 - config_name: pol-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 94909 num_examples: 1003 download_size: 57640 dataset_size: 94909 - config_name: por-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 56469 num_examples: 637 download_size: 31818 dataset_size: 56469 - config_name: por-cmn_Hant features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 30628 num_examples: 385 download_size: 18114 dataset_size: 30628 - config_name: por-por features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 234434 num_examples: 2499 - name: validation num_bytes: 432790 num_examples: 4667 download_size: 322867 dataset_size: 667224 - config_name: por-ron features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 57248 num_examples: 680 - name: validation num_bytes: 2252 num_examples: 30 download_size: 37814 dataset_size: 59500 - config_name: por-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1131882 num_examples: 9999 - name: validation num_bytes: 1169089 num_examples: 10049 download_size: 1139810 dataset_size: 2300971 - config_name: por-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1075488 num_examples: 10946 - name: validation num_bytes: 5453497 num_examples: 56715 download_size: 3351027 dataset_size: 6528985 - config_name: por-swe features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 25296 num_examples: 319 download_size: 16774 dataset_size: 25296 - config_name: por-tgl features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 173148 num_examples: 1776 download_size: 91178 dataset_size: 173148 - config_name: por-toki features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 179820 num_examples: 1718 - name: validation num_bytes: 64182 num_examples: 484 download_size: 113023 dataset_size: 244002 - config_name: por-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 146633 num_examples: 1793 download_size: 82852 dataset_size: 146633 - config_name: por-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 325345 num_examples: 3371 - name: validation num_bytes: 100029 num_examples: 1024 download_size: 211428 dataset_size: 425374 - config_name: por-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 87856 num_examples: 1032 download_size: 48124 dataset_size: 87856 - config_name: ron-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 75763 num_examples: 781 download_size: 40782 dataset_size: 75763 - config_name: ron-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 157598 num_examples: 1710 download_size: 85888 dataset_size: 157598 - config_name: ron-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 233141 num_examples: 2459 - name: validation num_bytes: 94986 num_examples: 1009 download_size: 186075 dataset_size: 328127 - config_name: run-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 101614 num_examples: 1250 download_size: 41298 dataset_size: 101614 - config_name: run-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 63752 num_examples: 962 download_size: 30106 dataset_size: 63752 - config_name: rus-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 129158 num_examples: 1085 - name: validation num_bytes: 265973 num_examples: 2220 download_size: 202720 dataset_size: 395131 - config_name: rus-cmn_Hant features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 84281 num_examples: 798 - name: validation num_bytes: 161779 num_examples: 1524 download_size: 125874 dataset_size: 246060 - config_name: rus-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 342636 num_examples: 2499 - name: validation num_bytes: 205795 num_examples: 1405 download_size: 261052 dataset_size: 548431 - config_name: rus-sah features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 129791 num_examples: 993 download_size: 61526 dataset_size: 129791 - config_name: rus-slv features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 64708 num_examples: 656 - name: validation num_bytes: 4506 num_examples: 44 download_size: 40906 dataset_size: 69214 - config_name: rus-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1176173 num_examples: 10505 - name: validation num_bytes: 9791707 num_examples: 86868 download_size: 4824662 dataset_size: 10967880 - config_name: rus-swe features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 131736 num_examples: 1281 - name: validation num_bytes: 5146 num_examples: 51 download_size: 70042 dataset_size: 136882 - config_name: rus-tat features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 474098 num_examples: 2137 - name: validation num_bytes: 215338 num_examples: 1004 download_size: 369077 dataset_size: 689436 - config_name: rus-tlh features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 22094 num_examples: 253 download_size: 12421 dataset_size: 22094 - config_name: rus-toki features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 97330 num_examples: 899 - name: validation num_bytes: 2837 num_examples: 19 download_size: 47548 dataset_size: 100167 - config_name: rus-toki_Latn features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 97279 num_examples: 898 download_size: 43424 dataset_size: 97279 - config_name: rus-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 576835 num_examples: 4983 - name: validation num_bytes: 732490 num_examples: 6215 download_size: 645009 dataset_size: 1309325 - config_name: rus-uig features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 72472 num_examples: 534 download_size: 34288 dataset_size: 72472 - config_name: rus-uig_Arab features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 71656 num_examples: 528 download_size: 33915 dataset_size: 71656 - config_name: rus-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 1226931 num_examples: 9999 - name: validation num_bytes: 920404 num_examples: 7426 download_size: 984731 dataset_size: 2147335 - config_name: rus-vie features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 38079 num_examples: 312 download_size: 22660 dataset_size: 38079 - config_name: rus-xal features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 25775 num_examples: 208 download_size: 15791 dataset_size: 25775 - config_name: rus-yue_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 26120 num_examples: 223 - name: validation num_bytes: 35524 num_examples: 318 download_size: 35649 dataset_size: 61644 - config_name: rus-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 290001 num_examples: 2499 - name: validation num_bytes: 562693 num_examples: 4843 download_size: 429211 dataset_size: 852694 - config_name: slv-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 62770 num_examples: 717 download_size: 37459 dataset_size: 62770 - config_name: slv-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 70850 num_examples: 914 - name: validation num_bytes: 4354 num_examples: 48 download_size: 39942 dataset_size: 75204 - config_name: slv-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 70456 num_examples: 824 download_size: 41402 dataset_size: 70456 - config_name: spa-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 97463 num_examples: 1141 - name: validation num_bytes: 247029 num_examples: 2870 download_size: 178075 dataset_size: 344492 - config_name: spa-cmn_Hant features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 123140 num_examples: 1470 - name: validation num_bytes: 323955 num_examples: 3792 download_size: 232279 dataset_size: 447095 - config_name: spa-spa features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 260912 num_examples: 2499 - name: validation num_bytes: 270460 num_examples: 2565 download_size: 305202 dataset_size: 531372 - config_name: spa-swe features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 106015 num_examples: 1350 - name: validation num_bytes: 16628 num_examples: 203 download_size: 71178 dataset_size: 122643 - config_name: spa-tat features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 36208 num_examples: 440 download_size: 18955 dataset_size: 36208 - config_name: spa-tgl features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 59245 num_examples: 630 download_size: 34203 dataset_size: 59245 - config_name: spa-tlh features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 24131 num_examples: 324 download_size: 14722 dataset_size: 24131 - config_name: spa-toki features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 53576 num_examples: 639 - name: validation num_bytes: 4230 num_examples: 30 download_size: 31345 dataset_size: 57806 - config_name: spa-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 870663 num_examples: 10614 - name: validation num_bytes: 1481421 num_examples: 18212 download_size: 1143903 dataset_size: 2352084 - config_name: spa-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 938715 num_examples: 10114 - name: validation num_bytes: 1207319 num_examples: 12968 download_size: 964837 dataset_size: 2146034 - config_name: spa-vie features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 53544 num_examples: 593 download_size: 30871 dataset_size: 53544 - config_name: spa-yid features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 42306 num_examples: 406 - name: validation num_bytes: 5071 num_examples: 42 download_size: 25398 dataset_size: 47377 - config_name: spa-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 228258 num_examples: 2704 - name: validation num_bytes: 592566 num_examples: 6921 download_size: 423632 dataset_size: 820824 - config_name: srp_Cyrl-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 109062 num_examples: 880 - name: validation num_bytes: 183387 num_examples: 1409 download_size: 141468 dataset_size: 292449 - config_name: srp_Cyrl-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 21290 num_examples: 204 download_size: 12024 dataset_size: 21290 - config_name: srp_Latn-ita features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 16809 num_examples: 211 download_size: 11288 dataset_size: 16809 - config_name: srp_Latn-nob features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 517529 num_examples: 4812 - name: validation num_bytes: 621219 num_examples: 5941 download_size: 628321 dataset_size: 1138748 - config_name: srp_Latn-rus features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 144734 num_examples: 1482 - name: validation num_bytes: 239333 num_examples: 2501 download_size: 183176 dataset_size: 384067 - config_name: srp_Latn-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 29098 num_examples: 347 download_size: 16225 dataset_size: 29098 - config_name: swe-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 38998 num_examples: 477 download_size: 20845 dataset_size: 38998 - config_name: swe-cmn_Hant features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 41792 num_examples: 566 download_size: 20980 dataset_size: 41792 - config_name: swe-swe features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 84497 num_examples: 1021 download_size: 44545 dataset_size: 84497 - config_name: swe-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 16475 num_examples: 202 download_size: 12269 dataset_size: 16475 - config_name: swe-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 81227 num_examples: 1048 download_size: 39832 dataset_size: 81227 - config_name: tat-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 53031 num_examples: 520 download_size: 29374 dataset_size: 53031 - config_name: tat-vie features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 33750 num_examples: 283 download_size: 19328 dataset_size: 33750 - config_name: tlh-yid features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 26514 num_examples: 288 - name: validation num_bytes: 4547 num_examples: 48 download_size: 15612 dataset_size: 31061 - config_name: tlh-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 36459 num_examples: 447 download_size: 20649 dataset_size: 36459 - config_name: tlh_Latn-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 20461 num_examples: 245 download_size: 13014 dataset_size: 20461 - config_name: tlh_Latn-cmn_Hant features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 15887 num_examples: 201 download_size: 10318 dataset_size: 15887 - config_name: tlh_Latn-yid features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 26452 num_examples: 287 download_size: 11465 dataset_size: 26452 - config_name: tur-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 47816 num_examples: 584 download_size: 26071 dataset_size: 47816 - config_name: tur-cmn_Hant features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 30886 num_examples: 395 download_size: 17961 dataset_size: 30886 - config_name: tur-tur features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 211897 num_examples: 2499 - name: validation num_bytes: 101055 num_examples: 1186 download_size: 163239 dataset_size: 312952 - config_name: tur-uig features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 166790 num_examples: 1399 - name: validation num_bytes: 121613 num_examples: 1010 download_size: 145651 dataset_size: 288403 - config_name: tur-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 222524 num_examples: 2519 - name: validation num_bytes: 591068 num_examples: 6721 download_size: 372843 dataset_size: 813592 - config_name: tur-uzb features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 16788 num_examples: 207 download_size: 11413 dataset_size: 16788 - config_name: tur-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 79096 num_examples: 985 download_size: 40582 dataset_size: 79096 - config_name: uig-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 229949 num_examples: 1928 download_size: 103743 dataset_size: 229949 - config_name: uig_Arab-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 167895 num_examples: 1382 download_size: 76890 dataset_size: 167895 - config_name: uig_Arab-cmn_Hant features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 61655 num_examples: 541 download_size: 29982 dataset_size: 61655 - config_name: ukr-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 89627 num_examples: 852 download_size: 45641 dataset_size: 89627 - config_name: ukr-cmn_Hant features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 48358 num_examples: 529 download_size: 24950 dataset_size: 48358 - config_name: ukr-ukr features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 82966 num_examples: 823 download_size: 38791 dataset_size: 82966 - config_name: ukr-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 157543 num_examples: 1574 download_size: 78523 dataset_size: 157543 - config_name: vie-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 42746 num_examples: 344 download_size: 25397 dataset_size: 42746 - config_name: vie-vie features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 75096 num_examples: 541 download_size: 37416 dataset_size: 75096 - config_name: vie-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 50886 num_examples: 439 download_size: 29158 dataset_size: 50886 - config_name: wuu-cmn_Hans features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 87844 num_examples: 872 - name: validation num_bytes: 179924 num_examples: 1775 download_size: 159352 dataset_size: 267768 - config_name: yid-yid features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 39189 num_examples: 291 download_size: 17085 dataset_size: 39189 - config_name: zho-zho features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 301592 num_examples: 3113 - name: validation num_bytes: 572569 num_examples: 5809 download_size: 508839 dataset_size: 874161 - config_name: zsm_Latn-ind features: - name: sourceLang dtype: string - name: targetlang dtype: string - name: sourceString dtype: string - name: targetString dtype: string splits: - name: test num_bytes: 26133 num_examples: 224 download_size: 17764 dataset_size: 26133 configs: - config_name: afr-deu data_files: - split: test path: afr-deu/test-* - config_name: afr-eng data_files: - split: test path: afr-eng/test-* - split: validation path: afr-eng/validation-* - config_name: afr-epo data_files: - split: test path: afr-epo/test-* - config_name: afr-nld data_files: - split: test path: afr-nld/test-* - config_name: afr-rus data_files: - split: test path: afr-rus/test-* - config_name: afr-spa data_files: - split: test path: afr-spa/test-* - config_name: ain-fin data_files: - split: test path: ain-fin/test-* - config_name: ara-ber data_files: - split: test path: ara-ber/test-* - split: validation path: ara-ber/validation-* - config_name: ara-ber_Latn data_files: - split: test path: ara-ber_Latn/test-* - split: validation path: ara-ber_Latn/validation-* - config_name: ara-deu data_files: - split: test path: ara-deu/test-* - split: validation path: ara-deu/validation-* - config_name: ara-ell data_files: - split: test path: ara-ell/test-* - config_name: ara-eng data_files: - split: test path: ara-eng/test-* - split: validation path: ara-eng/validation-* - config_name: ara-epo data_files: - split: test path: ara-epo/test-* - split: validation path: ara-epo/validation-* - config_name: ara-fra data_files: - split: test path: ara-fra/test-* - split: validation path: ara-fra/validation-* - config_name: ara-heb data_files: - split: test path: ara-heb/test-* - split: validation path: ara-heb/validation-* - config_name: ara-ita data_files: - split: test path: ara-ita/test-* - config_name: ara-jpn data_files: - split: test path: ara-jpn/test-* - config_name: ara-jpn_Hira data_files: - split: test path: ara-jpn_Hira/test-* - config_name: ara-pol data_files: - split: test path: ara-pol/test-* - config_name: ara-rus data_files: - split: test path: ara-rus/test-* - split: validation path: ara-rus/validation-* - config_name: ara-spa data_files: - split: test path: ara-spa/test-* - split: validation path: ara-spa/validation-* - config_name: ara-tur data_files: - split: test path: ara-tur/test-* - split: validation path: ara-tur/validation-* - config_name: arq-eng data_files: - split: test path: arq-eng/test-* - split: validation path: arq-eng/validation-* - config_name: avk-fra data_files: - split: test path: avk-fra/test-* - split: validation path: avk-fra/validation-* - config_name: avk-spa data_files: - split: test path: avk-spa/test-* - config_name: awa-eng data_files: - split: test path: awa-eng/test-* - config_name: aze-eng data_files: - split: test path: aze-eng/test-* - split: validation path: aze-eng/validation-* - config_name: aze-spa data_files: - split: test path: aze-spa/test-* - config_name: aze-tur data_files: - split: test path: aze-tur/test-* - config_name: aze_Latn-tur data_files: - split: test path: aze_Latn-tur/test-* - config_name: bel-deu data_files: - split: test path: bel-deu/test-* - config_name: bel-eng data_files: - split: test path: bel-eng/test-* - split: validation path: bel-eng/validation-* - config_name: bel-epo data_files: - split: test path: bel-epo/test-* - config_name: bel-fra data_files: - split: test path: bel-fra/test-* - config_name: bel-ita data_files: - split: test path: bel-ita/test-* - config_name: bel-lat data_files: - split: test path: bel-lat/test-* - config_name: bel-nld data_files: - split: test path: bel-nld/test-* - config_name: bel-pol data_files: - split: test path: bel-pol/test-* - config_name: bel-rus data_files: - split: test path: bel-rus/test-* - split: validation path: bel-rus/validation-* - config_name: bel-spa data_files: - split: test path: bel-spa/test-* - config_name: bel-ukr data_files: - split: test path: bel-ukr/test-* - split: validation path: bel-ukr/validation-* - config_name: bel-zho data_files: - split: test path: bel-zho/test-* - config_name: ben-eng data_files: - split: test path: ben-eng/test-* - split: validation path: ben-eng/validation-* - config_name: ber-deu data_files: - split: test path: ber-deu/test-* - config_name: ber-eng data_files: - split: test path: ber-eng/test-* - split: validation path: ber-eng/validation-* - config_name: ber-epo data_files: - split: test path: ber-epo/test-* - config_name: ber-fra data_files: - split: test path: ber-fra/test-* - split: validation path: ber-fra/validation-* - config_name: ber-spa data_files: - split: test path: ber-spa/test-* - split: validation path: ber-spa/validation-* - config_name: ber_Latn-deu data_files: - split: test path: ber_Latn-deu/test-* - config_name: ber_Latn-eng data_files: - split: test path: ber_Latn-eng/test-* - split: validation path: ber_Latn-eng/validation-* - config_name: ber_Latn-epo data_files: - split: test path: ber_Latn-epo/test-* - config_name: ber_Latn-fra data_files: - split: test path: ber_Latn-fra/test-* - split: validation path: ber_Latn-fra/validation-* - config_name: bre-eng data_files: - split: test path: bre-eng/test-* - config_name: bre-fra data_files: - split: test path: bre-fra/test-* - split: validation path: bre-fra/validation-* - config_name: bua-rus data_files: - split: test path: bua-rus/test-* - config_name: bua_Cyrl-rus data_files: - split: test path: bua_Cyrl-rus/test-* - config_name: bul-bul data_files: - split: test path: bul-bul/test-* - config_name: bul-cmn_Hans data_files: - split: test path: bul-cmn_Hans/test-* - config_name: bul-deu data_files: - split: test path: bul-deu/test-* - config_name: bul-eng data_files: - split: test path: bul-eng/test-* - split: validation path: bul-eng/validation-* - config_name: bul-epo data_files: - split: test path: bul-epo/test-* - config_name: bul-fra data_files: - split: test path: bul-fra/test-* - config_name: bul-ita data_files: - split: test path: bul-ita/test-* - split: validation path: bul-ita/validation-* - config_name: bul-jpn data_files: - split: test path: bul-jpn/test-* - config_name: bul-jpn_Hira data_files: - split: test path: bul-jpn_Hira/test-* - config_name: bul-rus data_files: - split: test path: bul-rus/test-* - split: validation path: bul-rus/validation-* - config_name: bul-spa data_files: - split: test path: bul-spa/test-* - config_name: bul-tur data_files: - split: test path: bul-tur/test-* - config_name: bul-ukr data_files: - split: test path: bul-ukr/test-* - config_name: bul-zho data_files: - split: test path: bul-zho/test-* - config_name: cat-deu data_files: - split: test path: cat-deu/test-* - config_name: cat-eng data_files: - split: test path: cat-eng/test-* - split: validation path: cat-eng/validation-* - config_name: cat-epo data_files: - split: test path: cat-epo/test-* - config_name: cat-fra data_files: - split: test path: cat-fra/test-* - config_name: cat-ita data_files: - split: test path: cat-ita/test-* - config_name: cat-nld data_files: - split: test path: cat-nld/test-* - config_name: cat-por data_files: - split: test path: cat-por/test-* - config_name: cat-spa data_files: - split: test path: cat-spa/test-* - split: validation path: cat-spa/validation-* - config_name: cat-ukr data_files: - split: test path: cat-ukr/test-* - config_name: cbk-eng data_files: - split: test path: cbk-eng/test-* - split: validation path: cbk-eng/validation-* - config_name: ceb-deu data_files: - split: test path: ceb-deu/test-* - config_name: ceb-eng data_files: - split: test path: ceb-eng/test-* - config_name: ces-deu data_files: - split: test path: ces-deu/test-* - split: validation path: ces-deu/validation-* - config_name: ces-eng data_files: - split: test path: ces-eng/test-* - split: validation path: ces-eng/validation-* - config_name: ces-epo data_files: - split: test path: ces-epo/test-* - split: validation path: ces-epo/validation-* - config_name: ces-fra data_files: - split: test path: ces-fra/test-* - config_name: ces-hun data_files: - split: test path: ces-hun/test-* - split: validation path: ces-hun/validation-* - config_name: ces-ita data_files: - split: test path: ces-ita/test-* - config_name: ces-lat data_files: - split: test path: ces-lat/test-* - config_name: ces-pol data_files: - split: test path: ces-pol/test-* - config_name: ces-rus data_files: - split: test path: ces-rus/test-* - split: validation path: ces-rus/validation-* - config_name: ces-slv data_files: - split: test path: ces-slv/test-* - config_name: ces-ukr data_files: - split: test path: ces-ukr/test-* - split: validation path: ces-ukr/validation-* - config_name: cha-eng data_files: - split: test path: cha-eng/test-* - config_name: chm-rus data_files: - split: test path: chm-rus/test-* - split: validation path: chm-rus/validation-* - config_name: chv-eng data_files: - split: test path: chv-eng/test-* - config_name: chv-rus data_files: - split: test path: chv-rus/test-* - config_name: chv-tur data_files: - split: test path: chv-tur/test-* - config_name: cmn_Hans-wuu data_files: - split: test path: cmn_Hans-wuu/test-* - split: validation path: cmn_Hans-wuu/validation-* - config_name: cor-deu data_files: - split: test path: cor-deu/test-* - config_name: cor-eng data_files: - split: test path: cor-eng/test-* - split: validation path: cor-eng/validation-* - config_name: cor-epo data_files: - split: test path: cor-epo/test-* - config_name: cor-fra data_files: - split: test path: cor-fra/test-* - config_name: cor-ita data_files: - split: test path: cor-ita/test-* - config_name: cor-rus data_files: - split: test path: cor-rus/test-* - config_name: cor-spa data_files: - split: test path: cor-spa/test-* - config_name: crh-tur data_files: - split: test path: crh-tur/test-* - config_name: cym-eng data_files: - split: test path: cym-eng/test-* - config_name: dan-dan data_files: - split: test path: dan-dan/test-* - config_name: dan-deu data_files: - split: test path: dan-deu/test-* - split: validation path: dan-deu/validation-* - config_name: dan-eng data_files: - split: test path: dan-eng/test-* - split: validation path: dan-eng/validation-* - config_name: dan-fin data_files: - split: test path: dan-fin/test-* - split: validation path: dan-fin/validation-* - config_name: dan-fra data_files: - split: test path: dan-fra/test-* - split: validation path: dan-fra/validation-* - config_name: dan-ita data_files: - split: test path: dan-ita/test-* - config_name: dan-jpn data_files: - split: test path: dan-jpn/test-* - config_name: dan-jpn_Hira data_files: - split: test path: dan-jpn_Hira/test-* - config_name: dan-nld data_files: - split: test path: dan-nld/test-* - split: validation path: dan-nld/validation-* - config_name: dan-nob data_files: - split: test path: dan-nob/test-* - split: validation path: dan-nob/validation-* - config_name: dan-nor data_files: - split: test path: dan-nor/test-* - split: validation path: dan-nor/validation-* - config_name: dan-por data_files: - split: test path: dan-por/test-* - config_name: dan-rus data_files: - split: test path: dan-rus/test-* - split: validation path: dan-rus/validation-* - config_name: dan-spa data_files: - split: test path: dan-spa/test-* - split: validation path: dan-spa/validation-* - config_name: dan-swe data_files: - split: test path: dan-swe/test-* - split: validation path: dan-swe/validation-* - config_name: dan-tur data_files: - split: test path: dan-tur/test-* - config_name: deu-cmn_Hans data_files: - split: test path: deu-cmn_Hans/test-* - split: validation path: deu-cmn_Hans/validation-* - config_name: deu-cmn_Hant data_files: - split: test path: deu-cmn_Hant/test-* - split: validation path: deu-cmn_Hant/validation-* - config_name: deu-deu data_files: - split: test path: deu-deu/test-* - split: validation path: deu-deu/validation-* - config_name: deu-dsb data_files: - split: test path: deu-dsb/test-* - config_name: ces-spa data_files: - split: test path: ces-spa/test-* - split: validation path: ces-spa/validation-* - config_name: dan-epo data_files: - split: test path: dan-epo/test-* - split: validation path: dan-epo/validation-* - config_name: deu-ell data_files: - split: test path: deu-ell/test-* - split: validation path: deu-ell/validation-* - config_name: deu-eng data_files: - split: test path: deu-eng/test-* - split: validation path: deu-eng/validation-* - config_name: deu-est data_files: - split: test path: deu-est/test-* - config_name: deu-eus data_files: - split: test path: deu-eus/test-* - config_name: deu-fas data_files: - split: test path: deu-fas/test-* - split: validation path: deu-fas/validation-* - config_name: deu-fin data_files: - split: test path: deu-fin/test-* - split: validation path: deu-fin/validation-* - config_name: deu-fra data_files: - split: test path: deu-fra/test-* - split: validation path: deu-fra/validation-* - config_name: deu-frr data_files: - split: test path: deu-frr/test-* - config_name: deu-gos data_files: - split: test path: deu-gos/test-* - config_name: deu-hbs data_files: - split: test path: deu-hbs/test-* - config_name: deu-heb data_files: - split: test path: deu-heb/test-* - split: validation path: deu-heb/validation-* - config_name: deu-hrv data_files: - split: test path: deu-hrv/test-* - config_name: deu-hrx data_files: - split: test path: deu-hrx/test-* - config_name: deu-hsb data_files: - split: test path: deu-hsb/test-* - config_name: deu-hun data_files: - split: test path: deu-hun/test-* - split: validation path: deu-hun/validation-* - config_name: deu-ido data_files: - split: test path: deu-ido/test-* - split: validation path: deu-ido/validation-* - config_name: deu-ile data_files: - split: test path: deu-ile/test-* - split: validation path: deu-ile/validation-* - config_name: deu-ina data_files: - split: test path: deu-ina/test-* - split: validation path: deu-ina/validation-* - config_name: deu-ind data_files: - split: test path: deu-ind/test-* - config_name: deu-isl data_files: - split: test path: deu-isl/test-* - config_name: deu-ita data_files: - split: test path: deu-ita/test-* - split: validation path: deu-ita/validation-* - config_name: deu-jbo data_files: - split: test path: deu-jbo/test-* - config_name: deu-jpn data_files: - split: test path: deu-jpn/test-* - split: validation path: deu-jpn/validation-* - config_name: deu-jpn_Hani data_files: - split: test path: deu-jpn_Hani/test-* - split: validation path: deu-jpn_Hani/validation-* - config_name: deu-jpn_Hira data_files: - split: test path: deu-jpn_Hira/test-* - split: validation path: deu-jpn_Hira/validation-* - config_name: deu-jpn_Kana data_files: - split: test path: deu-jpn_Kana/test-* - split: validation path: deu-jpn_Kana/validation-* - config_name: deu-kab data_files: - split: test path: deu-kab/test-* - config_name: deu-kor data_files: - split: test path: deu-kor/test-* - config_name: deu-kor_Hang data_files: - split: test path: deu-kor_Hang/test-* - config_name: deu-kur data_files: - split: test path: deu-kur/test-* - config_name: deu-kur_Latn data_files: - split: test path: deu-kur_Latn/test-* - config_name: deu-lad data_files: - split: test path: deu-lad/test-* - split: validation path: deu-lad/validation-* - config_name: deu-lat data_files: - split: test path: deu-lat/test-* - split: validation path: deu-lat/validation-* - config_name: deu-lfn data_files: - split: test path: deu-lfn/test-* - split: validation path: deu-lfn/validation-* - config_name: deu-lfn_Latn data_files: - split: test path: deu-lfn_Latn/test-* - split: validation path: deu-lfn_Latn/validation-* - config_name: deu-lit data_files: - split: test path: deu-lit/test-* - split: validation path: deu-lit/validation-* - config_name: deu-ltz data_files: - split: test path: deu-ltz/test-* - config_name: deu-msa data_files: - split: test path: deu-msa/test-* - config_name: deu-nds data_files: - split: test path: deu-nds/test-* - split: validation path: deu-nds/validation-* - config_name: deu-nld data_files: - split: test path: deu-nld/test-* - split: validation path: deu-nld/validation-* - config_name: deu-nob data_files: - split: test path: deu-nob/test-* - split: validation path: deu-nob/validation-* - config_name: deu-nor data_files: - split: test path: deu-nor/test-* - split: validation path: deu-nor/validation-* - config_name: deu-pol data_files: - split: test path: deu-pol/test-* - split: validation path: deu-pol/validation-* - config_name: deu-por data_files: - split: test path: deu-por/test-* - split: validation path: deu-por/validation-* - config_name: deu-ron data_files: - split: test path: deu-ron/test-* - split: validation path: deu-ron/validation-* - config_name: deu-run data_files: - split: test path: deu-run/test-* - split: validation path: deu-run/validation-* - config_name: deu-rus data_files: - split: test path: deu-rus/test-* - split: validation path: deu-rus/validation-* - config_name: deu-slv data_files: - split: test path: deu-slv/test-* - config_name: deu-spa data_files: - split: test path: deu-spa/test-* - split: validation path: deu-spa/validation-* - config_name: deu-srp_Latn data_files: - split: test path: deu-srp_Latn/test-* - config_name: deu-swe data_files: - split: test path: deu-swe/test-* - split: validation path: deu-swe/validation-* - config_name: deu-swg data_files: - split: test path: deu-swg/test-* - split: validation path: deu-swg/validation-* - config_name: deu-tat data_files: - split: test path: deu-tat/test-* - split: validation path: deu-tat/validation-* - config_name: deu-tgl data_files: - split: test path: deu-tgl/test-* - config_name: deu-tlh data_files: - split: test path: deu-tlh/test-* - split: validation path: deu-tlh/validation-* - config_name: deu-toki data_files: - split: test path: deu-toki/test-* - split: validation path: deu-toki/validation-* - config_name: deu-tur data_files: - split: test path: deu-tur/test-* - split: validation path: deu-tur/validation-* - config_name: deu-ukr data_files: - split: test path: deu-ukr/test-* - split: validation path: deu-ukr/validation-* - config_name: deu-vie data_files: - split: test path: deu-vie/test-* - config_name: deu-vol data_files: - split: test path: deu-vol/test-* - config_name: deu-yid data_files: - split: test path: deu-yid/test-* - split: validation path: deu-yid/validation-* - config_name: deu-zho data_files: - split: test path: deu-zho/test-* - split: validation path: deu-zho/validation-* - config_name: dsb-hsb data_files: - split: test path: dsb-hsb/test-* - config_name: dsb-slv data_files: - split: test path: dsb-slv/test-* - config_name: dtp-eng data_files: - split: test path: dtp-eng/test-* - split: validation path: dtp-eng/validation-* - config_name: dtp-jpn data_files: - split: test path: dtp-jpn/test-* - config_name: dtp-jpn_Hira data_files: - split: test path: dtp-jpn_Hira/test-* - config_name: dtp-msa data_files: - split: test path: dtp-msa/test-* - config_name: dtp-zsm_Latn data_files: - split: test path: dtp-zsm_Latn/test-* - config_name: egl-ita data_files: - split: test path: egl-ita/test-* - config_name: ell-ell data_files: - split: test path: ell-ell/test-* - config_name: ell-eng data_files: - split: test path: ell-eng/test-* - split: validation path: ell-eng/validation-* - config_name: ell-epo data_files: - split: test path: ell-epo/test-* - config_name: ell-fra data_files: - split: test path: ell-fra/test-* - config_name: ell-ita data_files: - split: test path: ell-ita/test-* - config_name: ell-nld data_files: - split: test path: ell-nld/test-* - split: validation path: ell-nld/validation-* - config_name: ell-por data_files: - split: test path: ell-por/test-* - split: validation path: ell-por/validation-* - config_name: ell-rus data_files: - split: test path: ell-rus/test-* - split: validation path: ell-rus/validation-* - config_name: ell-spa data_files: - split: test path: ell-spa/test-* - split: validation path: ell-spa/validation-* - config_name: ell-swe data_files: - split: test path: ell-swe/test-* - config_name: ell-tur data_files: - split: test path: ell-tur/test-* - split: validation path: ell-tur/validation-* - config_name: eng-bos_Latn data_files: - split: test path: eng-bos_Latn/test-* - split: validation path: eng-bos_Latn/validation-* - config_name: eng-cmn_Hans data_files: - split: test path: eng-cmn_Hans/test-* - split: validation path: eng-cmn_Hans/validation-* - config_name: eng-cmn_Hant data_files: - split: test path: eng-cmn_Hant/test-* - split: validation path: eng-cmn_Hant/validation-* - config_name: eng-eng data_files: - split: test path: eng-eng/test-* - split: validation path: eng-eng/validation-* - config_name: eng-est data_files: - split: test path: eng-est/test-* - split: validation path: eng-est/validation-* - config_name: eng-eus data_files: - split: test path: eng-eus/test-* - split: validation path: eng-eus/validation-* - config_name: eng-fao data_files: - split: test path: eng-fao/test-* - config_name: eng-fas data_files: - split: test path: eng-fas/test-* - split: validation path: eng-fas/validation-* - config_name: eng-fin data_files: - split: test path: eng-fin/test-* - split: validation path: eng-fin/validation-* - config_name: eng-fra data_files: - split: test path: eng-fra/test-* - split: validation path: eng-fra/validation-* - config_name: eng-fry data_files: - split: test path: eng-fry/test-* - config_name: eng-gla data_files: - split: test path: eng-gla/test-* - config_name: eng-gle data_files: - split: test path: eng-gle/test-* - split: validation path: eng-gle/validation-* - config_name: eng-glg data_files: - split: test path: eng-glg/test-* - config_name: eng-gos data_files: - split: test path: eng-gos/test-* - split: validation path: eng-gos/validation-* - config_name: eng-got data_files: - split: test path: eng-got/test-* - config_name: eng-grc data_files: - split: test path: eng-grc/test-* - config_name: eng-gsw data_files: - split: test path: eng-gsw/test-* - config_name: eng-hbs data_files: - split: test path: eng-hbs/test-* - split: validation path: eng-hbs/validation-* - config_name: eng-heb data_files: - split: test path: eng-heb/test-* - split: validation path: eng-heb/validation-* - config_name: eng-hin data_files: - split: test path: eng-hin/test-* - split: validation path: eng-hin/validation-* - config_name: eng-hoc data_files: - split: test path: eng-hoc/test-* - config_name: eng-hoc_Latn data_files: - split: test path: eng-hoc_Latn/test-* - config_name: eng-hrv data_files: - split: test path: eng-hrv/test-* - split: validation path: eng-hrv/validation-* - config_name: eng-hrx data_files: - split: test path: eng-hrx/test-* - config_name: eng-hun data_files: - split: test path: eng-hun/test-* - split: validation path: eng-hun/validation-* - config_name: eng-hye data_files: - split: test path: eng-hye/test-* - split: validation path: eng-hye/validation-* - config_name: eng-ido data_files: - split: test path: eng-ido/test-* - split: validation path: eng-ido/validation-* - config_name: eng-ido_Latn data_files: - split: test path: eng-ido_Latn/test-* - config_name: eng-ile data_files: - split: test path: eng-ile/test-* - split: validation path: eng-ile/validation-* - config_name: eng-ilo data_files: - split: test path: eng-ilo/test-* - split: validation path: eng-ilo/validation-* - config_name: eng-ina data_files: - split: test path: eng-ina/test-* - split: validation path: eng-ina/validation-* - config_name: eng-ind data_files: - split: test path: eng-ind/test-* - split: validation path: eng-ind/validation-* - config_name: eng-isl data_files: - split: test path: eng-isl/test-* - split: validation path: eng-isl/validation-* - config_name: eng-ita data_files: - split: test path: eng-ita/test-* - split: validation path: eng-ita/validation-* - config_name: eng-jav data_files: - split: test path: eng-jav/test-* - config_name: eng-jbo data_files: - split: test path: eng-jbo/test-* - split: validation path: eng-jbo/validation-* - config_name: eng-jbo_Latn data_files: - split: test path: eng-jbo_Latn/test-* - split: validation path: eng-jbo_Latn/validation-* - config_name: eng-jpn data_files: - split: test path: eng-jpn/test-* - split: validation path: eng-jpn/validation-* - config_name: eng-jpn_Hani data_files: - split: test path: eng-jpn_Hani/test-* - split: validation path: eng-jpn_Hani/validation-* - config_name: eng-jpn_Hira data_files: - split: test path: eng-jpn_Hira/test-* - split: validation path: eng-jpn_Hira/validation-* - config_name: eng-jpn_Kana data_files: - split: test path: eng-jpn_Kana/test-* - split: validation path: eng-jpn_Kana/validation-* - config_name: eng-kab data_files: - split: test path: eng-kab/test-* - split: validation path: eng-kab/validation-* - config_name: eng-kat data_files: - split: test path: eng-kat/test-* - config_name: eng-kaz data_files: - split: test path: eng-kaz/test-* - config_name: eng-kaz_Cyrl data_files: - split: test path: eng-kaz_Cyrl/test-* - config_name: eng-kha data_files: - split: test path: eng-kha/test-* - split: validation path: eng-kha/validation-* - config_name: eng-khm data_files: - split: test path: eng-khm/test-* - config_name: eng-kor data_files: - split: test path: eng-kor/test-* - split: validation path: eng-kor/validation-* - config_name: eng-kor_Hang data_files: - split: test path: eng-kor_Hang/test-* - split: validation path: eng-kor_Hang/validation-* - config_name: eng-kur data_files: - split: test path: eng-kur/test-* - split: validation path: eng-kur/validation-* - config_name: eng-kur_Latn data_files: - split: test path: eng-kur_Latn/test-* - config_name: eng-kzj data_files: - split: test path: eng-kzj/test-* - config_name: eng-lad data_files: - split: test path: eng-lad/test-* - split: validation path: eng-lad/validation-* - config_name: eng-lad_Latn data_files: - split: test path: eng-lad_Latn/test-* - split: validation path: eng-lad_Latn/validation-* - config_name: eng-lat data_files: - split: test path: eng-lat/test-* - split: validation path: eng-lat/validation-* - config_name: eng-lav data_files: - split: test path: eng-lav/test-* - config_name: eng-lfn data_files: - split: test path: eng-lfn/test-* - split: validation path: eng-lfn/validation-* - config_name: eng-lfn_Cyrl data_files: - split: test path: eng-lfn_Cyrl/test-* - split: validation path: eng-lfn_Cyrl/validation-* - config_name: eng-lfn_Latn data_files: - split: test path: eng-lfn_Latn/test-* - split: validation path: eng-lfn_Latn/validation-* - config_name: eng-lit data_files: - split: test path: eng-lit/test-* - split: validation path: eng-lit/validation-* - config_name: eng-ltz data_files: - split: test path: eng-ltz/test-* - config_name: eng-mal data_files: - split: test path: eng-mal/test-* - config_name: eng-mar data_files: - split: test path: eng-mar/test-* - split: validation path: eng-mar/validation-* - config_name: eng-mkd data_files: - split: test path: eng-mkd/test-* - split: validation path: eng-mkd/validation-* - config_name: eng-mlt data_files: - split: test path: eng-mlt/test-* - config_name: eng-mon data_files: - split: test path: eng-mon/test-* - split: validation path: eng-mon/validation-* - config_name: eng-mri data_files: - split: test path: eng-mri/test-* - config_name: eng-msa data_files: - split: test path: eng-msa/test-* - split: validation path: eng-msa/validation-* - config_name: eng-mya data_files: - split: test path: eng-mya/test-* - config_name: eng-nds data_files: - split: test path: eng-nds/test-* - split: validation path: eng-nds/validation-* - config_name: eng-nld data_files: - split: test path: eng-nld/test-* - split: validation path: eng-nld/validation-* - config_name: eng-nno data_files: - split: test path: eng-nno/test-* - split: validation path: eng-nno/validation-* - config_name: eng-nob data_files: - split: test path: eng-nob/test-* - split: validation path: eng-nob/validation-* - config_name: eng-nor data_files: - split: test path: eng-nor/test-* - split: validation path: eng-nor/validation-* - config_name: eng-nov data_files: - split: test path: eng-nov/test-* - config_name: eng-nst data_files: - split: test path: eng-nst/test-* - config_name: eng-oci data_files: - split: test path: eng-oci/test-* - config_name: eng-orv data_files: - split: test path: eng-orv/test-* - config_name: eng-ota data_files: - split: test path: eng-ota/test-* - config_name: eng-ota_Arab data_files: - split: test path: eng-ota_Arab/test-* - config_name: eng-ota_Latn data_files: - split: test path: eng-ota_Latn/test-* - config_name: eng-pam data_files: - split: test path: eng-pam/test-* - split: validation path: eng-pam/validation-* - config_name: eng-pes data_files: - split: test path: eng-pes/test-* - config_name: eng-pms data_files: - split: test path: eng-pms/test-* - config_name: eng-pol data_files: - split: test path: eng-pol/test-* - split: validation path: eng-pol/validation-* - config_name: eng-por data_files: - split: test path: eng-por/test-* - split: validation path: eng-por/validation-* - config_name: eng-prg data_files: - split: test path: eng-prg/test-* - config_name: eng-que data_files: - split: test path: eng-que/test-* - config_name: eng-rom data_files: - split: test path: eng-rom/test-* - config_name: eng-ron data_files: - split: test path: eng-ron/test-* - split: validation path: eng-ron/validation-* - config_name: eng-run data_files: - split: test path: eng-run/test-* - config_name: eng-rus data_files: - split: test path: eng-rus/test-* - split: validation path: eng-rus/validation-* - config_name: eng-slv data_files: - split: test path: eng-slv/test-* - split: validation path: eng-slv/validation-* - config_name: eng-spa data_files: - split: test path: eng-spa/test-* - split: validation path: eng-spa/validation-* - config_name: eng-sqi data_files: - split: test path: eng-sqi/test-* - config_name: eng-srp_Cyrl data_files: - split: test path: eng-srp_Cyrl/test-* - split: validation path: eng-srp_Cyrl/validation-* - config_name: eng-srp_Latn data_files: - split: test path: eng-srp_Latn/test-* - split: validation path: eng-srp_Latn/validation-* - config_name: eng-swa data_files: - split: test path: eng-swa/test-* - config_name: eng-swe data_files: - split: test path: eng-swe/test-* - split: validation path: eng-swe/validation-* - config_name: eng-tam data_files: - split: test path: eng-tam/test-* - config_name: eng-tat data_files: - split: test path: eng-tat/test-* - config_name: eng-tel data_files: - split: test path: eng-tel/test-* - config_name: eng-tgl data_files: - split: test path: eng-tgl/test-* - split: validation path: eng-tgl/validation-* - config_name: eng-tha data_files: - split: test path: eng-tha/test-* - config_name: eng-tlh data_files: - split: test path: eng-tlh/test-* - split: validation path: eng-tlh/validation-* - config_name: eng-toki data_files: - split: test path: eng-toki/test-* - split: validation path: eng-toki/validation-* - config_name: eng-tuk data_files: - split: test path: eng-tuk/test-* - split: validation path: eng-tuk/validation-* - config_name: eng-tuk_Latn data_files: - split: test path: eng-tuk_Latn/test-* - config_name: eng-tur data_files: - split: test path: eng-tur/test-* - split: validation path: eng-tur/validation-* - config_name: eng-tzl data_files: - split: test path: eng-tzl/test-* - split: validation path: eng-tzl/validation-* - config_name: eng-tzl_Latn data_files: - split: test path: eng-tzl_Latn/test-* - config_name: eng-uig data_files: - split: test path: eng-uig/test-* - split: validation path: eng-uig/validation-* - config_name: eng-uig_Arab data_files: - split: test path: eng-uig_Arab/test-* - split: validation path: eng-uig_Arab/validation-* - config_name: eng-ukr data_files: - split: test path: eng-ukr/test-* - split: validation path: eng-ukr/validation-* - config_name: eng-urd data_files: - split: test path: eng-urd/test-* - config_name: eng-uzb data_files: - split: test path: eng-uzb/test-* - config_name: eng-uzb_Latn data_files: - split: test path: eng-uzb_Latn/test-* - config_name: eng-vie data_files: - split: test path: eng-vie/test-* - split: validation path: eng-vie/validation-* - config_name: eng-vol data_files: - split: test path: eng-vol/test-* - split: validation path: eng-vol/validation-* - config_name: eng-war data_files: - split: test path: eng-war/test-* - config_name: eng-xal data_files: - split: test path: eng-xal/test-* - config_name: eng-yid data_files: - split: test path: eng-yid/test-* - split: validation path: eng-yid/validation-* - config_name: eng-yue_Hans data_files: - split: test path: eng-yue_Hans/test-* - split: validation path: eng-yue_Hans/validation-* - config_name: eng-yue_Hant data_files: - split: test path: eng-yue_Hant/test-* - split: validation path: eng-yue_Hant/validation-* - config_name: eng-zho data_files: - split: test path: eng-zho/test-* - split: validation path: eng-zho/validation-* - config_name: eng-zsm_Latn data_files: - split: test path: eng-zsm_Latn/test-* - split: validation path: eng-zsm_Latn/validation-* - config_name: eng-zza data_files: - split: test path: eng-zza/test-* - config_name: epo-cmn_Hans data_files: - split: test path: epo-cmn_Hans/test-* - split: validation path: epo-cmn_Hans/validation-* - config_name: epo-cmn_Hant data_files: - split: test path: epo-cmn_Hant/test-* - split: validation path: epo-cmn_Hant/validation-* - config_name: epo-epo data_files: - split: test path: epo-epo/test-* - split: validation path: epo-epo/validation-* - config_name: epo-fas data_files: - split: test path: epo-fas/test-* - split: validation path: epo-fas/validation-* - config_name: epo-fin data_files: - split: test path: epo-fin/test-* - split: validation path: epo-fin/validation-* - config_name: epo-fra data_files: - split: test path: epo-fra/test-* - split: validation path: epo-fra/validation-* - config_name: epo-glg data_files: - split: test path: epo-glg/test-* - config_name: epo-hbs data_files: - split: test path: epo-hbs/test-* - split: validation path: epo-hbs/validation-* - config_name: epo-heb data_files: - split: test path: epo-heb/test-* - split: validation path: epo-heb/validation-* - config_name: epo-hrv data_files: - split: test path: epo-hrv/test-* - split: validation path: epo-hrv/validation-* - config_name: epo-hun data_files: - split: test path: epo-hun/test-* - split: validation path: epo-hun/validation-* - config_name: epo-ido data_files: - split: test path: epo-ido/test-* - split: validation path: epo-ido/validation-* - config_name: epo-ile data_files: - split: test path: epo-ile/test-* - split: validation path: epo-ile/validation-* - config_name: epo-ile_Latn data_files: - split: test path: epo-ile_Latn/test-* - config_name: epo-ina data_files: - split: test path: epo-ina/test-* - split: validation path: epo-ina/validation-* - config_name: epo-isl data_files: - split: test path: epo-isl/test-* - config_name: epo-ita data_files: - split: test path: epo-ita/test-* - split: validation path: epo-ita/validation-* - config_name: epo-jbo data_files: - split: test path: epo-jbo/test-* - config_name: epo-jpn data_files: - split: test path: epo-jpn/test-* - split: validation path: epo-jpn/validation-* - config_name: epo-jpn_Hani data_files: - split: test path: epo-jpn_Hani/test-* - split: validation path: epo-jpn_Hani/validation-* - config_name: epo-jpn_Hira data_files: - split: test path: epo-jpn_Hira/test-* - split: validation path: epo-jpn_Hira/validation-* - config_name: epo-lad data_files: - split: test path: epo-lad/test-* - split: validation path: epo-lad/validation-* - config_name: epo-lad_Latn data_files: - split: test path: epo-lad_Latn/test-* - split: validation path: epo-lad_Latn/validation-* - config_name: epo-lat data_files: - split: test path: epo-lat/test-* - split: validation path: epo-lat/validation-* - config_name: epo-lfn data_files: - split: test path: epo-lfn/test-* - split: validation path: epo-lfn/validation-* - config_name: epo-lfn_Latn data_files: - split: test path: epo-lfn_Latn/test-* - split: validation path: epo-lfn_Latn/validation-* - config_name: epo-lit data_files: - split: test path: epo-lit/test-* - split: validation path: epo-lit/validation-* - config_name: epo-nds data_files: - split: test path: epo-nds/test-* - split: validation path: epo-nds/validation-* - config_name: epo-nld data_files: - split: test path: epo-nld/test-* - split: validation path: epo-nld/validation-* - config_name: epo-nob data_files: - split: test path: epo-nob/test-* - split: validation path: epo-nob/validation-* - config_name: epo-nor data_files: - split: test path: epo-nor/test-* - split: validation path: epo-nor/validation-* - config_name: epo-oci data_files: - split: test path: epo-oci/test-* - config_name: epo-pol data_files: - split: test path: epo-pol/test-* - split: validation path: epo-pol/validation-* - config_name: epo-ron data_files: - split: test path: epo-ron/test-* - split: validation path: epo-ron/validation-* - config_name: epo-rus data_files: - split: test path: epo-rus/test-* - split: validation path: epo-rus/validation-* - config_name: epo-slv data_files: - split: test path: epo-slv/test-* - config_name: epo-spa data_files: - split: test path: epo-spa/test-* - split: validation path: epo-spa/validation-* - config_name: epo-srp_Cyrl data_files: - split: test path: epo-srp_Cyrl/test-* - split: validation path: epo-srp_Cyrl/validation-* - config_name: epo-srp_Latn data_files: - split: test path: epo-srp_Latn/test-* - split: validation path: epo-srp_Latn/validation-* - config_name: epo-swe data_files: - split: test path: epo-swe/test-* - split: validation path: epo-swe/validation-* - config_name: epo-tgl data_files: - split: test path: epo-tgl/test-* - config_name: epo-tlh data_files: - split: test path: epo-tlh/test-* - split: validation path: epo-tlh/validation-* - config_name: epo-toki data_files: - split: test path: epo-toki/test-* - split: validation path: epo-toki/validation-* - config_name: epo-tur data_files: - split: test path: epo-tur/test-* - split: validation path: epo-tur/validation-* - config_name: epo-ukr data_files: - split: test path: epo-ukr/test-* - split: validation path: epo-ukr/validation-* - config_name: epo-vie data_files: - split: test path: epo-vie/test-* - config_name: epo-vol data_files: - split: test path: epo-vol/test-* - split: validation path: epo-vol/validation-* - config_name: epo-yid data_files: - split: test path: epo-yid/test-* - split: validation path: epo-yid/validation-* - config_name: epo-zho data_files: - split: test path: epo-zho/test-* - split: validation path: epo-zho/validation-* - config_name: est-rus data_files: - split: test path: est-rus/test-* - config_name: eus-jpn data_files: - split: test path: eus-jpn/test-* - config_name: eus-rus data_files: - split: test path: eus-rus/test-* - config_name: eus-spa data_files: - split: test path: eus-spa/test-* - split: validation path: eus-spa/validation-* - config_name: fas-fra data_files: - split: test path: fas-fra/test-* - config_name: fin-fin data_files: - split: test path: fin-fin/test-* - split: validation path: fin-fin/validation-* - config_name: fin-fkv data_files: - split: test path: fin-fkv/test-* - split: validation path: fin-fkv/validation-* - config_name: fin-fra data_files: - split: test path: fin-fra/test-* - split: validation path: fin-fra/validation-* - config_name: fin-heb data_files: - split: test path: fin-heb/test-* - config_name: fin-hun data_files: - split: test path: fin-hun/test-* - config_name: fin-ita data_files: - split: test path: fin-ita/test-* - split: validation path: fin-ita/validation-* - config_name: fin-jpn data_files: - split: test path: fin-jpn/test-* - split: validation path: fin-jpn/validation-* - config_name: fin-jpn_Hani data_files: - split: test path: fin-jpn_Hani/test-* - split: validation path: fin-jpn_Hani/validation-* - config_name: fin-jpn_Hira data_files: - split: test path: fin-jpn_Hira/test-* - split: validation path: fin-jpn_Hira/validation-* - config_name: fin-jpn_Kana data_files: - split: test path: fin-jpn_Kana/test-* - split: validation path: fin-jpn_Kana/validation-* - config_name: fin-kor data_files: - split: test path: fin-kor/test-* - config_name: fin-kor_Hang data_files: - split: test path: fin-kor_Hang/test-* - config_name: fin-kur data_files: - split: test path: fin-kur/test-* - config_name: fin-lat data_files: - split: test path: fin-lat/test-* - config_name: fin-nld data_files: - split: test path: fin-nld/test-* - config_name: fin-nno data_files: - split: test path: fin-nno/test-* - split: validation path: fin-nno/validation-* - config_name: fin-nob data_files: - split: test path: fin-nob/test-* - split: validation path: fin-nob/validation-* - config_name: fin-nor data_files: - split: test path: fin-nor/test-* - split: validation path: fin-nor/validation-* - config_name: fin-pol data_files: - split: test path: fin-pol/test-* - config_name: fin-por data_files: - split: test path: fin-por/test-* - config_name: fin-rus data_files: - split: test path: fin-rus/test-* - split: validation path: fin-rus/validation-* - config_name: fin-spa data_files: - split: test path: fin-spa/test-* - split: validation path: fin-spa/validation-* - config_name: fin-swe data_files: - split: test path: fin-swe/test-* - split: validation path: fin-swe/validation-* - config_name: fin-tur data_files: - split: test path: fin-tur/test-* - config_name: fin-zho data_files: - split: test path: fin-zho/test-* - config_name: fra-cmn_Hans data_files: - split: test path: fra-cmn_Hans/test-* - split: validation path: fra-cmn_Hans/validation-* - config_name: fra-cmn_Hant data_files: - split: test path: fra-cmn_Hant/test-* - split: validation path: fra-cmn_Hant/validation-* - config_name: fra-fra data_files: - split: test path: fra-fra/test-* - split: validation path: fra-fra/validation-* - config_name: fra-gcf data_files: - split: test path: fra-gcf/test-* - config_name: fra-hbs data_files: - split: test path: fra-hbs/test-* - config_name: fra-heb data_files: - split: test path: fra-heb/test-* - split: validation path: fra-heb/validation-* - config_name: fra-hrv data_files: - split: test path: fra-hrv/test-* - config_name: fra-hun data_files: - split: test path: fra-hun/test-* - split: validation path: fra-hun/validation-* - config_name: fra-ido data_files: - split: test path: fra-ido/test-* - split: validation path: fra-ido/validation-* - config_name: fra-ile data_files: - split: test path: fra-ile/test-* - config_name: fra-ina data_files: - split: test path: fra-ina/test-* - split: validation path: fra-ina/validation-* - config_name: fra-ind data_files: - split: test path: fra-ind/test-* - config_name: fra-ita data_files: - split: test path: fra-ita/test-* - split: validation path: fra-ita/validation-* - config_name: fra-jbo data_files: - split: test path: fra-jbo/test-* - config_name: fra-jpn data_files: - split: test path: fra-jpn/test-* - split: validation path: fra-jpn/validation-* - config_name: fra-jpn_Hani data_files: - split: test path: fra-jpn_Hani/test-* - split: validation path: fra-jpn_Hani/validation-* - config_name: fra-jpn_Hira data_files: - split: test path: fra-jpn_Hira/test-* - split: validation path: fra-jpn_Hira/validation-* - config_name: fra-kab data_files: - split: test path: fra-kab/test-* - split: validation path: fra-kab/validation-* - config_name: fra-kor data_files: - split: test path: fra-kor/test-* - config_name: fra-kor_Hang data_files: - split: test path: fra-kor_Hang/test-* - config_name: fra-lat data_files: - split: test path: fra-lat/test-* - split: validation path: fra-lat/validation-* - config_name: fra-lfn data_files: - split: test path: fra-lfn/test-* - split: validation path: fra-lfn/validation-* - config_name: fra-lfn_Latn data_files: - split: test path: fra-lfn_Latn/test-* - split: validation path: fra-lfn_Latn/validation-* - config_name: fra-msa data_files: - split: test path: fra-msa/test-* - config_name: fra-nds data_files: - split: test path: fra-nds/test-* - config_name: fra-nld data_files: - split: test path: fra-nld/test-* - split: validation path: fra-nld/validation-* - config_name: fra-nob data_files: - split: test path: fra-nob/test-* - config_name: fra-nor data_files: - split: test path: fra-nor/test-* - config_name: fra-oci data_files: - split: test path: fra-oci/test-* - config_name: fra-pcd data_files: - split: test path: fra-pcd/test-* - config_name: fra-pol data_files: - split: test path: fra-pol/test-* - split: validation path: fra-pol/validation-* - config_name: fra-por data_files: - split: test path: fra-por/test-* - split: validation path: fra-por/validation-* - config_name: fra-ron data_files: - split: test path: fra-ron/test-* - config_name: fra-run data_files: - split: test path: fra-run/test-* - config_name: fra-rus data_files: - split: test path: fra-rus/test-* - split: validation path: fra-rus/validation-* - config_name: fra-slv data_files: - split: test path: fra-slv/test-* - config_name: fra-spa data_files: - split: test path: fra-spa/test-* - split: validation path: fra-spa/validation-* - config_name: fra-swe data_files: - split: test path: fra-swe/test-* - split: validation path: fra-swe/validation-* - config_name: fra-tat data_files: - split: test path: fra-tat/test-* - config_name: fra-tgl data_files: - split: test path: fra-tgl/test-* - config_name: fra-tlh data_files: - split: test path: fra-tlh/test-* - config_name: fra-tlh_Latn data_files: - split: test path: fra-tlh_Latn/test-* - config_name: fra-toki data_files: - split: test path: fra-toki/test-* - split: validation path: fra-toki/validation-* - config_name: fra-toki_Latn data_files: - split: test path: fra-toki_Latn/test-* - config_name: fra-tur data_files: - split: test path: fra-tur/test-* - split: validation path: fra-tur/validation-* - config_name: fra-uig data_files: - split: test path: fra-uig/test-* - config_name: fra-uig_Arab data_files: - split: test path: fra-uig_Arab/test-* - config_name: fra-ukr data_files: - split: test path: fra-ukr/test-* - split: validation path: fra-ukr/validation-* - config_name: fra-vie data_files: - split: test path: fra-vie/test-* - config_name: fra-wuu data_files: - split: test path: fra-wuu/test-* - split: validation path: fra-wuu/validation-* - config_name: fra-yid data_files: - split: test path: fra-yid/test-* - split: validation path: fra-yid/validation-* - config_name: fra-zho data_files: - split: test path: fra-zho/test-* - split: validation path: fra-zho/validation-* - config_name: fry-nld data_files: - split: test path: fry-nld/test-* - config_name: gcf-gcf data_files: - split: test path: gcf-gcf/test-* - config_name: gla-spa data_files: - split: test path: gla-spa/test-* - config_name: glg-por data_files: - split: test path: glg-por/test-* - config_name: glg-spa data_files: - split: test path: glg-spa/test-* - split: validation path: glg-spa/validation-* - config_name: gos-nld data_files: - split: test path: gos-nld/test-* - split: validation path: gos-nld/validation-* - config_name: grn-por data_files: - split: test path: grn-por/test-* - split: validation path: grn-por/validation-* - config_name: grn-spa data_files: - split: test path: grn-spa/test-* - split: validation path: grn-spa/validation-* - config_name: hbs-ita data_files: - split: test path: hbs-ita/test-* - config_name: hbs-jpn data_files: - split: test path: hbs-jpn/test-* - config_name: hbs-nor data_files: - split: test path: hbs-nor/test-* - split: validation path: hbs-nor/validation-* - config_name: hbs-pol data_files: - split: test path: hbs-pol/test-* - config_name: hbs-rus data_files: - split: test path: hbs-rus/test-* - split: validation path: hbs-rus/validation-* - config_name: hbs-spa data_files: - split: test path: hbs-spa/test-* - config_name: hbs-ukr data_files: - split: test path: hbs-ukr/test-* - config_name: hbs-zho data_files: - split: test path: hbs-zho/test-* - config_name: heb-cmn_Hans data_files: - split: test path: heb-cmn_Hans/test-* - config_name: heb-cmn_Hant data_files: - split: test path: heb-cmn_Hant/test-* - config_name: heb-heb data_files: - split: test path: heb-heb/test-* - split: validation path: heb-heb/validation-* - config_name: heb-hun data_files: - split: test path: heb-hun/test-* - config_name: heb-ina data_files: - split: test path: heb-ina/test-* - split: validation path: heb-ina/validation-* - config_name: heb-ita data_files: - split: test path: heb-ita/test-* - split: validation path: heb-ita/validation-* - config_name: heb-jpn data_files: - split: test path: heb-jpn/test-* - config_name: heb-jpn_Hira data_files: - split: test path: heb-jpn_Hira/test-* - config_name: heb-lad data_files: - split: test path: heb-lad/test-* - split: validation path: heb-lad/validation-* - config_name: heb-lat data_files: - split: test path: heb-lat/test-* - split: validation path: heb-lat/validation-* - config_name: heb-lfn data_files: - split: test path: heb-lfn/test-* - split: validation path: heb-lfn/validation-* - config_name: heb-lfn_Latn data_files: - split: test path: heb-lfn_Latn/test-* - split: validation path: heb-lfn_Latn/validation-* - config_name: heb-nld data_files: - split: test path: heb-nld/test-* - split: validation path: heb-nld/validation-* - config_name: heb-pol data_files: - split: test path: heb-pol/test-* - split: validation path: heb-pol/validation-* - config_name: heb-por data_files: - split: test path: heb-por/test-* - split: validation path: heb-por/validation-* - config_name: heb-rus data_files: - split: test path: heb-rus/test-* - split: validation path: heb-rus/validation-* - config_name: heb-spa data_files: - split: test path: heb-spa/test-* - split: validation path: heb-spa/validation-* - config_name: heb-tur data_files: - split: test path: heb-tur/test-* - split: validation path: heb-tur/validation-* - config_name: heb-ukr data_files: - split: test path: heb-ukr/test-* - config_name: heb-yid data_files: - split: test path: heb-yid/test-* - split: validation path: heb-yid/validation-* - config_name: heb-zho data_files: - split: test path: heb-zho/test-* - config_name: hin-urd data_files: - split: test path: hin-urd/test-* - config_name: hin-zho data_files: - split: test path: hin-zho/test-* - config_name: hrv-jpn_Hira data_files: - split: test path: hrv-jpn_Hira/test-* - config_name: hrv-pol data_files: - split: test path: hrv-pol/test-* - config_name: hrv-spa data_files: - split: test path: hrv-spa/test-* - config_name: hrv-ukr data_files: - split: test path: hrv-ukr/test-* - config_name: hsb-slv data_files: - split: test path: hsb-slv/test-* - config_name: hun-cmn_Hans data_files: - split: test path: hun-cmn_Hans/test-* - config_name: hun-hun data_files: - split: test path: hun-hun/test-* - config_name: hun-ita data_files: - split: test path: hun-ita/test-* - split: validation path: hun-ita/validation-* - config_name: hun-jpn data_files: - split: test path: hun-jpn/test-* - split: validation path: hun-jpn/validation-* - config_name: hun-jpn_Hani data_files: - split: test path: hun-jpn_Hani/test-* - split: validation path: hun-jpn_Hani/validation-* - config_name: hun-jpn_Hira data_files: - split: test path: hun-jpn_Hira/test-* - split: validation path: hun-jpn_Hira/validation-* - config_name: hun-kor data_files: - split: test path: hun-kor/test-* - config_name: hun-kor_Hang data_files: - split: test path: hun-kor_Hang/test-* - config_name: hun-lat data_files: - split: test path: hun-lat/test-* - config_name: hun-nld data_files: - split: test path: hun-nld/test-* - split: validation path: hun-nld/validation-* - config_name: hun-pol data_files: - split: test path: hun-pol/test-* - config_name: hun-por data_files: - split: test path: hun-por/test-* - split: validation path: hun-por/validation-* - config_name: hun-rus data_files: - split: test path: hun-rus/test-* - split: validation path: hun-rus/validation-* - config_name: hun-spa data_files: - split: test path: hun-spa/test-* - split: validation path: hun-spa/validation-* - config_name: hun-swe data_files: - split: test path: hun-swe/test-* - split: validation path: hun-swe/validation-* - config_name: hun-tur data_files: - split: test path: hun-tur/test-* - config_name: hun-ukr data_files: - split: test path: hun-ukr/test-* - config_name: hun-zho data_files: - split: test path: hun-zho/test-* - config_name: hye-rus data_files: - split: test path: hye-rus/test-* - config_name: ido-ina data_files: - split: test path: ido-ina/test-* - split: validation path: ido-ina/validation-* - config_name: ido-ita data_files: - split: test path: ido-ita/test-* - split: validation path: ido-ita/validation-* - config_name: ido-lfn data_files: - split: test path: ido-lfn/test-* - split: validation path: ido-lfn/validation-* - config_name: ido-spa data_files: - split: test path: ido-spa/test-* - config_name: ido-yid data_files: - split: test path: ido-yid/test-* - split: validation path: ido-yid/validation-* - config_name: ido_Latn-lfn_Latn data_files: - split: test path: ido_Latn-lfn_Latn/test-* - split: validation path: ido_Latn-lfn_Latn/validation-* - config_name: ina-ita data_files: - split: test path: ina-ita/test-* - config_name: ina-lad data_files: - split: test path: ina-lad/test-* - split: validation path: ina-lad/validation-* - config_name: ina-lat data_files: - split: test path: ina-lat/test-* - split: validation path: ina-lat/validation-* - config_name: ina-lfn data_files: - split: test path: ina-lfn/test-* - split: validation path: ina-lfn/validation-* - config_name: ina-nld data_files: - split: test path: ina-nld/test-* - config_name: ina-por data_files: - split: test path: ina-por/test-* - split: validation path: ina-por/validation-* - config_name: ina-rus data_files: - split: test path: ina-rus/test-* - split: validation path: ina-rus/validation-* - config_name: ina-spa data_files: - split: test path: ina-spa/test-* - split: validation path: ina-spa/validation-* - config_name: ina-tlh data_files: - split: test path: ina-tlh/test-* - split: validation path: ina-tlh/validation-* - config_name: ina-tur data_files: - split: test path: ina-tur/test-* - config_name: ina-yid data_files: - split: test path: ina-yid/test-* - split: validation path: ina-yid/validation-* - config_name: ina_Latn-lad_Latn data_files: - split: test path: ina_Latn-lad_Latn/test-* - split: validation path: ina_Latn-lad_Latn/validation-* - config_name: ina_Latn-lfn_Latn data_files: - split: test path: ina_Latn-lfn_Latn/test-* - split: validation path: ina_Latn-lfn_Latn/validation-* - config_name: ina_Latn-tlh_Latn data_files: - split: test path: ina_Latn-tlh_Latn/test-* - config_name: ind-zsm_Latn data_files: - split: test path: ind-zsm_Latn/test-* - config_name: isl-ita data_files: - split: test path: isl-ita/test-* - config_name: isl-jpn data_files: - split: test path: isl-jpn/test-* - config_name: isl-jpn_Hira data_files: - split: test path: isl-jpn_Hira/test-* - config_name: isl-spa data_files: - split: test path: isl-spa/test-* - config_name: ita-cmn_Hans data_files: - split: test path: ita-cmn_Hans/test-* - split: validation path: ita-cmn_Hans/validation-* - config_name: ita-cmn_Hant data_files: - split: test path: ita-cmn_Hant/test-* - split: validation path: ita-cmn_Hant/validation-* - config_name: ita-ind data_files: - split: test path: ita-ind/test-* - config_name: ita-ita data_files: - split: test path: ita-ita/test-* - split: validation path: ita-ita/validation-* - config_name: ita-jpn data_files: - split: test path: ita-jpn/test-* - split: validation path: ita-jpn/validation-* - config_name: ita-jpn_Hani data_files: - split: test path: ita-jpn_Hani/test-* - split: validation path: ita-jpn_Hani/validation-* - config_name: ita-jpn_Hira data_files: - split: test path: ita-jpn_Hira/test-* - split: validation path: ita-jpn_Hira/validation-* - config_name: ita-lat data_files: - split: test path: ita-lat/test-* - split: validation path: ita-lat/validation-* - config_name: ita-lit data_files: - split: test path: ita-lit/test-* - config_name: ita-msa data_files: - split: test path: ita-msa/test-* - config_name: ita-nds data_files: - split: test path: ita-nds/test-* - config_name: ita-nld data_files: - split: test path: ita-nld/test-* - split: validation path: ita-nld/validation-* - config_name: ita-nor data_files: - split: test path: ita-nor/test-* - config_name: ita-pms data_files: - split: test path: ita-pms/test-* - config_name: ita-pol data_files: - split: test path: ita-pol/test-* - split: validation path: ita-pol/validation-* - config_name: ita-por data_files: - split: test path: ita-por/test-* - split: validation path: ita-por/validation-* - config_name: ita-ron data_files: - split: test path: ita-ron/test-* - config_name: ita-rus data_files: - split: test path: ita-rus/test-* - split: validation path: ita-rus/validation-* - config_name: ita-spa data_files: - split: test path: ita-spa/test-* - split: validation path: ita-spa/validation-* - config_name: ita-swe data_files: - split: test path: ita-swe/test-* - config_name: ita-toki data_files: - split: test path: ita-toki/test-* - config_name: ita-tur data_files: - split: test path: ita-tur/test-* - split: validation path: ita-tur/validation-* - config_name: ita-ukr data_files: - split: test path: ita-ukr/test-* - split: validation path: ita-ukr/validation-* - config_name: ita-vie data_files: - split: test path: ita-vie/test-* - config_name: ita-yid data_files: - split: test path: ita-yid/test-* - split: validation path: ita-yid/validation-* - config_name: ita-zho data_files: - split: test path: ita-zho/test-* - split: validation path: ita-zho/validation-* - config_name: jbo-jpn data_files: - split: test path: jbo-jpn/test-* - config_name: jbo-rus data_files: - split: test path: jbo-rus/test-* - config_name: jbo-spa data_files: - split: test path: jbo-spa/test-* - config_name: jbo-swe data_files: - split: test path: jbo-swe/test-* - config_name: jbo-zho data_files: - split: test path: jbo-zho/test-* - split: validation path: jbo-zho/validation-* - config_name: jbo_Latn-cmn_Hans data_files: - split: test path: jbo_Latn-cmn_Hans/test-* - config_name: jbo_Latn-cmn_Hant data_files: - split: test path: jbo_Latn-cmn_Hant/test-* - split: validation path: jbo_Latn-cmn_Hant/validation-* - config_name: jbo_Latn-jpn_Hira data_files: - split: test path: jbo_Latn-jpn_Hira/test-* - config_name: jpn-jpn data_files: - split: test path: jpn-jpn/test-* - config_name: jpn-kor data_files: - split: test path: jpn-kor/test-* - config_name: jpn-lit data_files: - split: test path: jpn-lit/test-* - config_name: jpn-mar data_files: - split: test path: jpn-mar/test-* - config_name: jpn-msa data_files: - split: test path: jpn-msa/test-* - split: validation path: jpn-msa/validation-* - config_name: jpn-nds data_files: - split: test path: jpn-nds/test-* - split: validation path: jpn-nds/validation-* - config_name: jpn-nld data_files: - split: test path: jpn-nld/test-* - split: validation path: jpn-nld/validation-* - config_name: jpn-nor data_files: - split: test path: jpn-nor/test-* - config_name: jpn-pol data_files: - split: test path: jpn-pol/test-* - split: validation path: jpn-pol/validation-* - config_name: jpn-por data_files: - split: test path: jpn-por/test-* - split: validation path: jpn-por/validation-* - config_name: jpn-rus data_files: - split: test path: jpn-rus/test-* - split: validation path: jpn-rus/validation-* - config_name: jpn-spa data_files: - split: test path: jpn-spa/test-* - split: validation path: jpn-spa/validation-* - config_name: jpn-swe data_files: - split: test path: jpn-swe/test-* - config_name: jpn-tlh data_files: - split: test path: jpn-tlh/test-* - config_name: jpn-toki data_files: - split: test path: jpn-toki/test-* - config_name: jpn-tur data_files: - split: test path: jpn-tur/test-* - config_name: jpn-ukr data_files: - split: test path: jpn-ukr/test-* - split: validation path: jpn-ukr/validation-* - config_name: jpn-vie data_files: - split: test path: jpn-vie/test-* - split: validation path: jpn-vie/validation-* - config_name: jpn-zho data_files: - split: test path: jpn-zho/test-* - split: validation path: jpn-zho/validation-* - config_name: jpn_Hani-cmn_Hans data_files: - split: test path: jpn_Hani-cmn_Hans/test-* - split: validation path: jpn_Hani-cmn_Hans/validation-* - config_name: jpn_Hani-nld data_files: - split: test path: jpn_Hani-nld/test-* - split: validation path: jpn_Hani-nld/validation-* - config_name: jpn_Hani-pol data_files: - split: test path: jpn_Hani-pol/test-* - split: validation path: jpn_Hani-pol/validation-* - config_name: jpn_Hani-por data_files: - split: test path: jpn_Hani-por/test-* - split: validation path: jpn_Hani-por/validation-* - config_name: jpn_Hani-rus data_files: - split: test path: jpn_Hani-rus/test-* - split: validation path: jpn_Hani-rus/validation-* - config_name: jpn_Hani-spa data_files: - split: test path: jpn_Hani-spa/test-* - split: validation path: jpn_Hani-spa/validation-* - config_name: jpn_Hira-cmn_Hans data_files: - split: test path: jpn_Hira-cmn_Hans/test-* - split: validation path: jpn_Hira-cmn_Hans/validation-* - config_name: jpn_Hira-cmn_Hant data_files: - split: test path: jpn_Hira-cmn_Hant/test-* - split: validation path: jpn_Hira-cmn_Hant/validation-* - config_name: jpn_Hira-ind data_files: - split: test path: jpn_Hira-ind/test-* - split: validation path: jpn_Hira-ind/validation-* - config_name: jpn_Hira-jpn_Hira data_files: - split: test path: jpn_Hira-jpn_Hira/test-* - config_name: jpn_Hira-kor_Hang data_files: - split: test path: jpn_Hira-kor_Hang/test-* - config_name: jpn_Hira-lit data_files: - split: test path: jpn_Hira-lit/test-* - config_name: jpn_Hira-mar data_files: - split: test path: jpn_Hira-mar/test-* - config_name: jpn_Hira-nds data_files: - split: test path: jpn_Hira-nds/test-* - split: validation path: jpn_Hira-nds/validation-* - config_name: jpn_Hira-nld data_files: - split: test path: jpn_Hira-nld/test-* - split: validation path: jpn_Hira-nld/validation-* - config_name: jpn_Hira-nob data_files: - split: test path: jpn_Hira-nob/test-* - config_name: jpn_Hira-pol data_files: - split: test path: jpn_Hira-pol/test-* - split: validation path: jpn_Hira-pol/validation-* - config_name: jpn_Hira-por data_files: - split: test path: jpn_Hira-por/test-* - split: validation path: jpn_Hira-por/validation-* - config_name: jpn_Hira-rus data_files: - split: test path: jpn_Hira-rus/test-* - split: validation path: jpn_Hira-rus/validation-* - config_name: jpn_Hira-spa data_files: - split: test path: jpn_Hira-spa/test-* - split: validation path: jpn_Hira-spa/validation-* - config_name: jpn_Hira-swe data_files: - split: test path: jpn_Hira-swe/test-* - config_name: jpn_Hira-tlh_Latn data_files: - split: test path: jpn_Hira-tlh_Latn/test-* - config_name: jpn_Hira-tur data_files: - split: test path: jpn_Hira-tur/test-* - config_name: jpn_Hira-ukr data_files: - split: test path: jpn_Hira-ukr/test-* - config_name: jpn_Hira-vie data_files: - split: test path: jpn_Hira-vie/test-* - split: validation path: jpn_Hira-vie/validation-* - config_name: jpn_Kana-rus data_files: - split: test path: jpn_Kana-rus/test-* - split: validation path: jpn_Kana-rus/validation-* - config_name: jpn_Kana-spa data_files: - split: test path: jpn_Kana-spa/test-* - split: validation path: jpn_Kana-spa/validation-* - config_name: kab-kab data_files: - split: test path: kab-kab/test-* - split: validation path: kab-kab/validation-* - config_name: kab-rus data_files: - split: test path: kab-rus/test-* - config_name: kab-spa data_files: - split: test path: kab-spa/test-* - config_name: kat-rus data_files: - split: test path: kat-rus/test-* - config_name: kaz-rus data_files: - split: test path: kaz-rus/test-* - split: validation path: kaz-rus/validation-* - config_name: kaz_Cyrl-rus data_files: - split: test path: kaz_Cyrl-rus/test-* - config_name: khm-spa data_files: - split: test path: khm-spa/test-* - config_name: kor-rus data_files: - split: test path: kor-rus/test-* - config_name: kor-spa data_files: - split: test path: kor-spa/test-* - config_name: kor-zho data_files: - split: test path: kor-zho/test-* - config_name: kor_Hang-cmn_Hans data_files: - split: test path: kor_Hang-cmn_Hans/test-* - config_name: kor_Hang-rus data_files: - split: test path: kor_Hang-rus/test-* - config_name: kor_Hang-spa data_files: - split: test path: kor_Hang-spa/test-* - config_name: kzj-msa data_files: - split: test path: kzj-msa/test-* - config_name: kzj_Latn-zsm_Latn data_files: - split: test path: kzj_Latn-zsm_Latn/test-* - config_name: lad-lat data_files: - split: test path: lad-lat/test-* - split: validation path: lad-lat/validation-* - config_name: lad-lfn data_files: - split: test path: lad-lfn/test-* - split: validation path: lad-lfn/validation-* - config_name: lad-spa data_files: - split: test path: lad-spa/test-* - split: validation path: lad-spa/validation-* - config_name: lad-yid data_files: - split: test path: lad-yid/test-* - split: validation path: lad-yid/validation-* - config_name: lad_Latn-lfn_Latn data_files: - split: test path: lad_Latn-lfn_Latn/test-* - split: validation path: lad_Latn-lfn_Latn/validation-* - config_name: lad_Latn-spa data_files: - split: test path: lad_Latn-spa/test-* - config_name: lad_Latn-yid data_files: - split: test path: lad_Latn-yid/test-* - split: validation path: lad_Latn-yid/validation-* - config_name: lat-lat data_files: - split: test path: lat-lat/test-* - config_name: lat-lfn data_files: - split: test path: lat-lfn/test-* - split: validation path: lat-lfn/validation-* - config_name: lat-nld data_files: - split: test path: lat-nld/test-* - config_name: lat-nor data_files: - split: test path: lat-nor/test-* - config_name: lat-pol data_files: - split: test path: lat-pol/test-* - config_name: lat-rus data_files: - split: test path: lat-rus/test-* - split: validation path: lat-rus/validation-* - config_name: lat-tlh data_files: - split: test path: lat-tlh/test-* - split: validation path: lat-tlh/validation-* - config_name: lat-ukr data_files: - split: test path: lat-ukr/test-* - config_name: lat-yid data_files: - split: test path: lat-yid/test-* - split: validation path: lat-yid/validation-* - config_name: lat_Latn-lfn_Latn data_files: - split: test path: lat_Latn-lfn_Latn/test-* - split: validation path: lat_Latn-lfn_Latn/validation-* - config_name: lav-rus data_files: - split: test path: lav-rus/test-* - config_name: lfn-rus data_files: - split: test path: lfn-rus/test-* - split: validation path: lfn-rus/validation-* - config_name: lfn-spa data_files: - split: test path: lfn-spa/test-* - split: validation path: lfn-spa/validation-* - config_name: lfn-yid data_files: - split: test path: lfn-yid/test-* - split: validation path: lfn-yid/validation-* - config_name: lfn_Cyrl-por data_files: - split: test path: lfn_Cyrl-por/test-* - split: validation path: lfn_Cyrl-por/validation-* - config_name: lfn_Latn-yid data_files: - split: test path: lfn_Latn-yid/test-* - split: validation path: lfn_Latn-yid/validation-* - config_name: lit-pol data_files: - split: test path: lit-pol/test-* - config_name: lit-rus data_files: - split: test path: lit-rus/test-* - split: validation path: lit-rus/validation-* - config_name: lit-spa data_files: - split: test path: lit-spa/test-* - config_name: lit-tur data_files: - split: test path: lit-tur/test-* - config_name: ltz-nld data_files: - split: test path: ltz-nld/test-* - config_name: mkd-spa data_files: - split: test path: mkd-spa/test-* - config_name: msa-msa data_files: - split: test path: msa-msa/test-* - config_name: msa-spa data_files: - split: test path: msa-spa/test-* - config_name: msa-zho data_files: - split: test path: msa-zho/test-* - config_name: nds-nld data_files: - split: test path: nds-nld/test-* - split: validation path: nds-nld/validation-* - config_name: nds-por data_files: - split: test path: nds-por/test-* - config_name: nds-rus data_files: - split: test path: nds-rus/test-* - split: validation path: nds-rus/validation-* - config_name: nds-spa data_files: - split: test path: nds-spa/test-* - config_name: nld-cmn_Hant data_files: - split: test path: nld-cmn_Hant/test-* - config_name: nld-nld data_files: - split: test path: nld-nld/test-* - split: validation path: nld-nld/validation-* - config_name: nld-nor data_files: - split: test path: nld-nor/test-* - config_name: nld-pol data_files: - split: test path: nld-pol/test-* - split: validation path: nld-pol/validation-* - config_name: nld-por data_files: - split: test path: nld-por/test-* - split: validation path: nld-por/validation-* - config_name: nld-ron data_files: - split: test path: nld-ron/test-* - split: validation path: nld-ron/validation-* - config_name: nld-rus data_files: - split: test path: nld-rus/test-* - split: validation path: nld-rus/validation-* - config_name: nld-spa data_files: - split: test path: nld-spa/test-* - split: validation path: nld-spa/validation-* - config_name: nld-toki data_files: - split: test path: nld-toki/test-* - config_name: nld-tur data_files: - split: test path: nld-tur/test-* - split: validation path: nld-tur/validation-* - config_name: nld-ukr data_files: - split: test path: nld-ukr/test-* - split: validation path: nld-ukr/validation-* - config_name: nld-zho data_files: - split: test path: nld-zho/test-* - split: validation path: nld-zho/validation-* - config_name: nno-nob data_files: - split: test path: nno-nob/test-* - config_name: nob-nno data_files: - split: test path: nob-nno/test-* - config_name: nob-rus data_files: - split: test path: nob-rus/test-* - config_name: nob-spa data_files: - split: test path: nob-spa/test-* - config_name: nob-swe data_files: - split: test path: nob-swe/test-* - config_name: nor-nor data_files: - split: test path: nor-nor/test-* - split: validation path: nor-nor/validation-* - config_name: nor-pol data_files: - split: test path: nor-pol/test-* - config_name: nor-por data_files: - split: test path: nor-por/test-* - config_name: nor-rus data_files: - split: test path: nor-rus/test-* - config_name: nor-spa data_files: - split: test path: nor-spa/test-* - config_name: nor-swe data_files: - split: test path: nor-swe/test-* - config_name: nor-ukr data_files: - split: test path: nor-ukr/test-* - config_name: nor-zho data_files: - split: test path: nor-zho/test-* - config_name: orv-ukr data_files: - split: test path: orv-ukr/test-* - config_name: ota-tur data_files: - split: test path: ota-tur/test-* - split: validation path: ota-tur/validation-* - config_name: pol-cmn_Hans data_files: - split: test path: pol-cmn_Hans/test-* - config_name: pol-cmn_Hant data_files: - split: test path: pol-cmn_Hant/test-* - config_name: pol-por data_files: - split: test path: pol-por/test-* - config_name: pol-rus data_files: - split: test path: pol-rus/test-* - split: validation path: pol-rus/validation-* - config_name: pol-spa data_files: - split: test path: pol-spa/test-* - split: validation path: pol-spa/validation-* - config_name: pol-swe data_files: - split: test path: pol-swe/test-* - config_name: pol-tur data_files: - split: test path: pol-tur/test-* - config_name: pol-ukr data_files: - split: test path: pol-ukr/test-* - split: validation path: pol-ukr/validation-* - config_name: pol-zho data_files: - split: test path: pol-zho/test-* - config_name: por-cmn_Hans data_files: - split: test path: por-cmn_Hans/test-* - config_name: por-cmn_Hant data_files: - split: test path: por-cmn_Hant/test-* - config_name: por-por data_files: - split: test path: por-por/test-* - split: validation path: por-por/validation-* - config_name: por-ron data_files: - split: test path: por-ron/test-* - split: validation path: por-ron/validation-* - config_name: por-rus data_files: - split: test path: por-rus/test-* - split: validation path: por-rus/validation-* - config_name: por-spa data_files: - split: test path: por-spa/test-* - split: validation path: por-spa/validation-* - config_name: por-swe data_files: - split: test path: por-swe/test-* - config_name: por-tgl data_files: - split: test path: por-tgl/test-* - config_name: por-toki data_files: - split: test path: por-toki/test-* - split: validation path: por-toki/validation-* - config_name: por-tur data_files: - split: test path: por-tur/test-* - config_name: por-ukr data_files: - split: test path: por-ukr/test-* - split: validation path: por-ukr/validation-* - config_name: por-zho data_files: - split: test path: por-zho/test-* - config_name: ron-rus data_files: - split: test path: ron-rus/test-* - config_name: ron-spa data_files: - split: test path: ron-spa/test-* - config_name: ron-tur data_files: - split: test path: ron-tur/test-* - split: validation path: ron-tur/validation-* - config_name: run-rus data_files: - split: test path: run-rus/test-* - config_name: run-spa data_files: - split: test path: run-spa/test-* - config_name: rus-cmn_Hans data_files: - split: test path: rus-cmn_Hans/test-* - split: validation path: rus-cmn_Hans/validation-* - config_name: rus-cmn_Hant data_files: - split: test path: rus-cmn_Hant/test-* - split: validation path: rus-cmn_Hant/validation-* - config_name: rus-rus data_files: - split: test path: rus-rus/test-* - split: validation path: rus-rus/validation-* - config_name: rus-sah data_files: - split: test path: rus-sah/test-* - config_name: rus-slv data_files: - split: test path: rus-slv/test-* - split: validation path: rus-slv/validation-* - config_name: rus-spa data_files: - split: test path: rus-spa/test-* - split: validation path: rus-spa/validation-* - config_name: rus-swe data_files: - split: test path: rus-swe/test-* - split: validation path: rus-swe/validation-* - config_name: rus-tat data_files: - split: test path: rus-tat/test-* - split: validation path: rus-tat/validation-* - config_name: rus-tlh data_files: - split: test path: rus-tlh/test-* - config_name: rus-toki data_files: - split: test path: rus-toki/test-* - split: validation path: rus-toki/validation-* - config_name: rus-toki_Latn data_files: - split: test path: rus-toki_Latn/test-* - config_name: rus-tur data_files: - split: test path: rus-tur/test-* - split: validation path: rus-tur/validation-* - config_name: rus-uig data_files: - split: test path: rus-uig/test-* - config_name: rus-uig_Arab data_files: - split: test path: rus-uig_Arab/test-* - config_name: rus-ukr data_files: - split: test path: rus-ukr/test-* - split: validation path: rus-ukr/validation-* - config_name: rus-vie data_files: - split: test path: rus-vie/test-* - config_name: rus-xal data_files: - split: test path: rus-xal/test-* - config_name: rus-yue_Hans data_files: - split: test path: rus-yue_Hans/test-* - split: validation path: rus-yue_Hans/validation-* - config_name: rus-zho data_files: - split: test path: rus-zho/test-* - split: validation path: rus-zho/validation-* - config_name: slv-cmn_Hans data_files: - split: test path: slv-cmn_Hans/test-* - config_name: slv-ukr data_files: - split: test path: slv-ukr/test-* - split: validation path: slv-ukr/validation-* - config_name: slv-zho data_files: - split: test path: slv-zho/test-* - config_name: spa-cmn_Hans data_files: - split: test path: spa-cmn_Hans/test-* - split: validation path: spa-cmn_Hans/validation-* - config_name: spa-cmn_Hant data_files: - split: test path: spa-cmn_Hant/test-* - split: validation path: spa-cmn_Hant/validation-* - config_name: spa-spa data_files: - split: test path: spa-spa/test-* - split: validation path: spa-spa/validation-* - config_name: spa-swe data_files: - split: test path: spa-swe/test-* - split: validation path: spa-swe/validation-* - config_name: spa-tat data_files: - split: test path: spa-tat/test-* - config_name: spa-tgl data_files: - split: test path: spa-tgl/test-* - config_name: spa-tlh data_files: - split: test path: spa-tlh/test-* - config_name: spa-toki data_files: - split: test path: spa-toki/test-* - split: validation path: spa-toki/validation-* - config_name: spa-tur data_files: - split: test path: spa-tur/test-* - split: validation path: spa-tur/validation-* - config_name: spa-ukr data_files: - split: test path: spa-ukr/test-* - split: validation path: spa-ukr/validation-* - config_name: spa-vie data_files: - split: test path: spa-vie/test-* - config_name: spa-yid data_files: - split: test path: spa-yid/test-* - split: validation path: spa-yid/validation-* - config_name: spa-zho data_files: - split: test path: spa-zho/test-* - split: validation path: spa-zho/validation-* - config_name: srp_Cyrl-rus data_files: - split: test path: srp_Cyrl-rus/test-* - split: validation path: srp_Cyrl-rus/validation-* - config_name: srp_Cyrl-ukr data_files: - split: test path: srp_Cyrl-ukr/test-* - config_name: srp_Latn-ita data_files: - split: test path: srp_Latn-ita/test-* - config_name: srp_Latn-nob data_files: - split: test path: srp_Latn-nob/test-* - split: validation path: srp_Latn-nob/validation-* - config_name: srp_Latn-rus data_files: - split: test path: srp_Latn-rus/test-* - split: validation path: srp_Latn-rus/validation-* - config_name: srp_Latn-ukr data_files: - split: test path: srp_Latn-ukr/test-* - config_name: swe-cmn_Hans data_files: - split: test path: swe-cmn_Hans/test-* - config_name: swe-cmn_Hant data_files: - split: test path: swe-cmn_Hant/test-* - config_name: swe-swe data_files: - split: test path: swe-swe/test-* - config_name: swe-tur data_files: - split: test path: swe-tur/test-* - config_name: swe-zho data_files: - split: test path: swe-zho/test-* - config_name: tat-tur data_files: - split: test path: tat-tur/test-* - config_name: tat-vie data_files: - split: test path: tat-vie/test-* - config_name: tlh-yid data_files: - split: test path: tlh-yid/test-* - split: validation path: tlh-yid/validation-* - config_name: tlh-zho data_files: - split: test path: tlh-zho/test-* - config_name: tlh_Latn-cmn_Hans data_files: - split: test path: tlh_Latn-cmn_Hans/test-* - config_name: tlh_Latn-cmn_Hant data_files: - split: test path: tlh_Latn-cmn_Hant/test-* - config_name: tlh_Latn-yid data_files: - split: test path: tlh_Latn-yid/test-* - config_name: tur-cmn_Hans data_files: - split: test path: tur-cmn_Hans/test-* - config_name: tur-cmn_Hant data_files: - split: test path: tur-cmn_Hant/test-* - config_name: tur-tur data_files: - split: test path: tur-tur/test-* - split: validation path: tur-tur/validation-* - config_name: tur-uig data_files: - split: test path: tur-uig/test-* - split: validation path: tur-uig/validation-* - config_name: tur-ukr data_files: - split: test path: tur-ukr/test-* - split: validation path: tur-ukr/validation-* - config_name: tur-uzb data_files: - split: test path: tur-uzb/test-* - config_name: tur-zho data_files: - split: test path: tur-zho/test-* - config_name: uig-zho data_files: - split: test path: uig-zho/test-* - config_name: uig_Arab-cmn_Hans data_files: - split: test path: uig_Arab-cmn_Hans/test-* - config_name: uig_Arab-cmn_Hant data_files: - split: test path: uig_Arab-cmn_Hant/test-* - config_name: ukr-cmn_Hans data_files: - split: test path: ukr-cmn_Hans/test-* - config_name: ukr-cmn_Hant data_files: - split: test path: ukr-cmn_Hant/test-* - config_name: ukr-ukr data_files: - split: test path: ukr-ukr/test-* - config_name: ukr-zho data_files: - split: test path: ukr-zho/test-* - config_name: vie-cmn_Hans data_files: - split: test path: vie-cmn_Hans/test-* - config_name: vie-vie data_files: - split: test path: vie-vie/test-* - config_name: vie-zho data_files: - split: test path: vie-zho/test-* - config_name: wuu-cmn_Hans data_files: - split: test path: wuu-cmn_Hans/test-* - split: validation path: wuu-cmn_Hans/validation-* - config_name: yid-yid data_files: - split: test path: yid-yid/test-* - config_name: zho-zho data_files: - split: test path: zho-zho/test-* - split: validation path: zho-zho/validation-* - config_name: zsm_Latn-ind data_files: - split: test path: zsm_Latn-ind/test-* --- # Dataset Card for DigitalLearningGmbH/tatoeba_mt_parquet This is a mirror of [Helsinki-NLP/tatoeba_mt](https://huggingface.co/datasets/Helsinki-NLP/tatoeba_mt), converted to parquet for compatibility with newer huggingface requirements. Original dataset card follows. ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** https://github.com/Helsinki-NLP/Tatoeba-Challenge/ - **Repository:** https://github.com/Helsinki-NLP/Tatoeba-Challenge/ - **Paper:** [The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT](https://aclanthology.org/2020.wmt-1.139/) - **Leaderboard:** - **Point of Contact:** [Jörg Tiedemann](mailto:jorg.tiedemann@helsinki.fi) ### Dataset Summary The Tatoeba Translation Challenge is a multilingual data set of machine translation benchmarks derived from user-contributed translations collected by [Tatoeba.org](https://tatoeba.org/) and provided as parallel corpus from [OPUS](https://opus.nlpl.eu/). This dataset includes test and development data sorted by language pair. It includes test sets for hundreds of language pairs and is continuously updated. Please, check the version number tag to refer to the release that your are using. ### Supported Tasks and Leaderboards The translation task is described in detail in the [Tatoeba-Challenge repository](https://github.com/Helsinki-NLP/Tatoeba-Challenge) and covers various sub-tasks with different data coverage and resources. [Training data](https://github.com/Helsinki-NLP/Tatoeba-Challenge/blob/master/data/README.md) is also available from the same repository and [results](https://github.com/Helsinki-NLP/Tatoeba-Challenge/blob/master/results/tatoeba-results-all.md) are published and collected as well. [Models](https://github.com/Helsinki-NLP/Tatoeba-Challenge/blob/master/results/tatoeba-models-all.md) are also released for public use and are also partially available from the [huggingface model hub](https://huggingface.co/Helsinki-NLP). ### Languages The data set covers hundreds of languages and language pairs and are organized by ISO-639-3 languages. The current release covers the following language: Afrikaans, Arabic, Azerbaijani, Belarusian, Bulgarian, Bengali, Breton, Bosnian, Catalan, Chamorro, Czech, Chuvash, Welsh, Danish, German, Modern Greek, English, Esperanto, Spanish, Estonian, Basque, Persian, Finnish, Faroese, French, Western Frisian, Irish, Scottish Gaelic, Galician, Guarani, Hebrew, Hindi, Croatian, Hungarian, Armenian, Interlingua, Indonesian, Interlingue, Ido, Icelandic, Italian, Japanese, Javanese, Georgian, Kazakh, Khmer, Korean, Kurdish, Cornish, Latin, Luxembourgish, Lithuanian, Latvian, Maori, Macedonian, Malayalam, Mongolian, Marathi, Malay, Maltese, Burmese, Norwegian Bokmål, Dutch, Norwegian Nynorsk, Norwegian, Occitan, Polish, Portuguese, Quechua, Rundi, Romanian, Russian, Serbo-Croatian, Slovenian, Albanian, Serbian, Swedish, Swahili, Tamil, Telugu, Thai, Turkmen, Tagalog, Turkish, Tatar, Uighur, Ukrainian, Urdu, Uzbek, Vietnamese, Volapük, Yiddish, Chinese ## Dataset Structure ### Data Instances Data instances are given as translation units in TAB-separated files with four columns: source and target language ISO-639-3 codes, source language text and target language text. Note that we do not imply a translation direction and consider the data set to be symmetric and to be used as a test set in both directions. Language-pair-specific subsets are only provided under the label of one direction using sorted ISO-639-3 language IDs. Some subsets contain several sub-languages or language variants. They may refer to macro-languages such as Serbo-Croatian languages that are covered by the ISO code `hbs`. Language variants may also include different writing systems and in that case the ISO15924 script codes are attached to the language codes. Here are a few examples from the English to Serbo-Croation test set including examples for Bosnian, Croatian and Serbian in Cyrillic and in Latin scripts: ``` eng bos_Latn Children are the flowers of our lives. Djeca su cvijeće našeg života. eng hrv A bird was flying high up in the sky. Ptica je visoko letjela nebom. eng srp_Cyrl A bird in the hand is worth two in the bush. Боље врабац у руци, него голуб на грани. eng srp_Latn Canada is the motherland of ice hockey. Kanada je zemlja-majka hokeja na ledu. ``` There are also data sets with sentence pairs in the same language. In most cases, those are variants with minor spelling differences but they also include rephrased sentences. Here are a few examples from the English test set: ``` eng eng All of us got into the car. We all got in the car. eng eng All of us hope that doesn't happen. All of us hope that that doesn't happen. eng eng All the seats are booked. The seats are all sold out. ``` ### Data Splits Test and development data sets are disjoint with respect to sentence pairs but may include overlaps in individual source or target language sentences. Development data should not be used in training directly. The goal of the data splits is to create test sets of reasonable size with a large language coverage. Test sets include at most 10,000 instances. Development data do not exist for all language pairs. To be comparable with other results, models should use the training data distributed from the [Tatoeba MT Challenge Repository](https://github.com/Helsinki-NLP/Tatoeba-Challenge/) including monolingual data sets also listed there. ## Dataset Creation ### Curation Rationale The Tatoeba MT data set will be updated continuously and the data preparation procedures are also public and released on [github](https://github.com/Helsinki-NLP/Tatoeba-Challenge/). High language coverage is the main goal of the project and data sets are prepared to be consistent and systematic with standardized language labels and distribution formats. ### Source Data #### Initial Data Collection and Normalization The Tatoeba data sets are collected from user-contributed translations submitted to [Tatoeba.org](https://tatoeba.org/) and compiled into a multi-parallel corpus in [OPUS](https://opus.nlpl.eu/Tatoeba.php). The test and development sets are incrementally updated with new releases of the Tatoeba data collection at OPUS. New releases extend the existing data sets. Test sets should not overlap with any of the released development data sets. #### Who are the source language producers? The data sets come from [Tatoeba.org](https://tatoeba.org/), which provides a large database of sentences and their translations into a wide varity of languages. Its content is constantly growing as a result of voluntary contributions of thousands of users. The original project was founded by Trang Ho in 2006, hosted on Sourceforge under the codename of multilangdict. ### Annotations #### Annotation process Sentences are translated by volunteers and the Tatoeba database also provides additional metadata about each record including user ratings etc. However, the metadata is currently not used in any way for the compilation of the MT benchmark. Language skills of contributors naturally vary quite a bit and not all translations are done by native speakers of the target language. More information about the contributions can be found at [Tatoeba.org](https://tatoeba.org/). #### Who are the annotators? ### Personal and Sensitive Information For information about handling personal and sensitive information we refer to the [original provider](https://tatoeba.org/) of the data. This data set has not been processed in any way to detect or remove potentially sensitve or personal information. ## Considerations for Using the Data ### Social Impact of Dataset The language coverage is high and with that it represents a highly valuable resource for machine translation development especially for lesser resourced languages and language pairs. The constantly growing database also represents a dynamic resource and its value wil grow further. ### Discussion of Biases The original source lives from its contributors and there interest and background will to certain subjective and cultural biases. Language coverage and translation quality is also biased by the skills of the contributors. ### Other Known Limitations The sentences are typically quite short and, therefore, rather easy to translate. For high-resource languages, this leads to results that will be less useful than more challenging benchmarks. For lesser resource language pairs, the limited complexity of the examples is actually a good thing to measure progress even in very challenging setups. ## Additional Information ### Dataset Curators The data set is curated by the University of Helsinki and its [language technology research group](https://blogs.helsinki.fi/language-technology/). Data and tools used for creating and using the resource are [open source](https://github.com/Helsinki-NLP/Tatoeba-Challenge/) and will be maintained as part of the [OPUS ecosystem](https://opus.nlpl.eu/) for parallel data and machine translation research. ### Licensing Information The data sets are distributed under the same licence agreement as the original Tatoeba database using a [CC-BY 2.0 license](https://creativecommons.org/licenses/by/2.0/fr/). More information about the terms of use of the original data sets is listed [here](https://tatoeba.org/eng/terms_of_use). ### Citation Information If you use the data sets then, please, cite the following paper: [The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT](https://aclanthology.org/2020.wmt-1.139/) ``` @inproceedings{tiedemann-2020-tatoeba, title = "The Tatoeba Translation Challenge {--} Realistic Data Sets for Low Resource and Multilingual {MT}", author = {Tiedemann, J{\"o}rg}, booktitle = "Proceedings of the Fifth Conference on Machine Translation", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2020.wmt-1.139", pages = "1174--1182", } ``` ### Contributions Thanks to [@jorgtied](https://github.com/jorgtied) and [@Helsinki-NLP](https://github.com/Helsinki-NLP) for adding this dataset. Thanks also to [CSC Finland](https://www.csc.fi/en/solutions-for-research) for providing computational resources and storage space for the work on OPUS and other MT projects.
提供机构:
DigitalLearningGmbH
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作