ayush-shunyalabs/translate-low-resource
收藏Hugging Face2026-02-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ayush-shunyalabs/translate-low-resource
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: ahr_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2427346
num_examples: 3990
download_size: 839125
dataset_size: 2427346
- config_name: ahr_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2430493
num_examples: 3990
download_size: 826508
dataset_size: 2430493
- config_name: ahr_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1903760
num_examples: 3979
download_size: 709753
dataset_size: 1903760
- config_name: ahr_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2449231
num_examples: 3997
download_size: 831011
dataset_size: 2449231
- config_name: ahr_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2392881
num_examples: 3954
download_size: 824039
dataset_size: 2392881
- config_name: ahr_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2820377
num_examples: 3980
download_size: 930907
dataset_size: 2820377
- config_name: as_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1831278
num_examples: 4066
download_size: 915639
dataset_size: 1831278
- config_name: as_en_mono
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2080401
num_examples: 9364
download_size: 486894
dataset_size: 2080401
- config_name: as_gu
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2066330
num_examples: 3997
download_size: 1033165
dataset_size: 2066330
- config_name: as_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2448630
num_examples: 9496
download_size: 559438
dataset_size: 2448630
- config_name: as_mni
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2395330
num_examples: 4000
download_size: 1197665
dataset_size: 2395330
- config_name: as_sa
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2065753
num_examples: 4021
download_size: 1032876
dataset_size: 2065753
- config_name: as_sd
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1899160
num_examples: 4024
download_size: 949580
dataset_size: 1899160
- config_name: as_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2101325
num_examples: 3860
download_size: 1050662
dataset_size: 2101325
- config_name: awa_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2686560
num_examples: 9671
download_size: 742621
dataset_size: 2686560
- config_name: awa_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 1832458
num_examples: 5385
download_size: 490224
dataset_size: 1832458
- config_name: bfy_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2266438
num_examples: 3998
download_size: 780545
dataset_size: 2266438
- config_name: bfy_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2277132
num_examples: 3998
download_size: 773212
dataset_size: 2277132
- config_name: bfy_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1835066
num_examples: 3979
download_size: 682758
dataset_size: 1835066
- config_name: bfy_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2258431
num_examples: 3997
download_size: 766208
dataset_size: 2258431
- config_name: bfy_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2245048
num_examples: 3998
download_size: 768326
dataset_size: 2245048
- config_name: bfy_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2618376
num_examples: 3988
download_size: 864581
dataset_size: 2618376
- config_name: bfz_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2316286
num_examples: 3980
download_size: 805177
dataset_size: 2316286
- config_name: bfz_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2355300
num_examples: 3997
download_size: 803366
dataset_size: 2355300
- config_name: bfz_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1872419
num_examples: 3965
download_size: 701344
dataset_size: 1872419
- config_name: bfz_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2332049
num_examples: 3995
download_size: 791877
dataset_size: 2332049
- config_name: bfz_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2321083
num_examples: 3993
download_size: 798459
dataset_size: 2321083
- config_name: bfz_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2727552
num_examples: 3990
download_size: 905801
dataset_size: 2727552
- config_name: bgc_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2256188
num_examples: 3980
download_size: 790471
dataset_size: 2256188
- config_name: bgc_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2327996
num_examples: 4000
download_size: 792464
dataset_size: 2327996
- config_name: bgc_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1865029
num_examples: 3983
download_size: 699046
dataset_size: 1865029
- config_name: bgc_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2306401
num_examples: 3998
download_size: 794830
dataset_size: 2306401
- config_name: bgc_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2187365
num_examples: 3988
download_size: 769994
dataset_size: 2187365
- config_name: bgc_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2584658
num_examples: 3939
download_size: 880983
dataset_size: 2584658
- config_name: bgq_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2311722
num_examples: 4000
download_size: 806248
dataset_size: 2311722
- config_name: bgq_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2307078
num_examples: 3997
download_size: 794609
dataset_size: 2307078
- config_name: bgq_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1810230
num_examples: 3980
download_size: 684679
dataset_size: 1810230
- config_name: bgq_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2263713
num_examples: 3998
download_size: 776621
dataset_size: 2263713
- config_name: bgq_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2270909
num_examples: 3997
download_size: 784233
dataset_size: 2270909
- config_name: bgq_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2708402
num_examples: 3999
download_size: 904330
dataset_size: 2708402
- config_name: bhb_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2312407
num_examples: 3998
download_size: 807944
dataset_size: 2312407
- config_name: bhb_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2312820
num_examples: 3995
download_size: 793709
dataset_size: 2312820
- config_name: bhb_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1829120
num_examples: 3976
download_size: 694990
dataset_size: 1829120
- config_name: bhb_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2308900
num_examples: 3995
download_size: 791013
dataset_size: 2308900
- config_name: bhb_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2268002
num_examples: 3986
download_size: 784358
dataset_size: 2268002
- config_name: bhb_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2698228
num_examples: 3999
download_size: 902350
dataset_size: 2698228
- config_name: bho_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2420699
num_examples: 10319
download_size: 570808
dataset_size: 2420699
- config_name: bho_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2800566
num_examples: 10292
download_size: 633252
dataset_size: 2800566
- config_name: bn_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 10435414
num_examples: 41325
download_size: 2415916
dataset_size: 10435414
- config_name: bn_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 12086880
num_examples: 41327
download_size: 2674382
dataset_size: 12086880
- config_name: bn_ks_deva
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2692319
num_examples: 3989
download_size: 1346159
dataset_size: 2692319
- config_name: bn_ml
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2222124
num_examples: 4038
download_size: 1111062
dataset_size: 2222124
- config_name: bn_mni
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2377812
num_examples: 3999
download_size: 1188906
dataset_size: 2377812
- config_name: bns_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2242153
num_examples: 3999
download_size: 776365
dataset_size: 2242153
- config_name: bns_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2292463
num_examples: 4000
download_size: 780072
dataset_size: 2292463
- config_name: bns_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1834083
num_examples: 3985
download_size: 680917
dataset_size: 1834083
- config_name: bns_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2244844
num_examples: 3996
download_size: 766080
dataset_size: 2244844
- config_name: bns_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2239771
num_examples: 3995
download_size: 770931
dataset_size: 2239771
- config_name: bns_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2654232
num_examples: 4000
download_size: 883369
dataset_size: 2654232
- config_name: bra_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2247418
num_examples: 3980
download_size: 773883
dataset_size: 2247418
- config_name: bra_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2233334
num_examples: 3998
download_size: 760230
dataset_size: 2233334
- config_name: bra_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1823953
num_examples: 3966
download_size: 678157
dataset_size: 1823953
- config_name: bra_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2198405
num_examples: 3995
download_size: 752224
dataset_size: 2198405
- config_name: bra_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2221653
num_examples: 3998
download_size: 762329
dataset_size: 2221653
- config_name: bra_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2621993
num_examples: 3990
download_size: 867390
dataset_size: 2621993
- config_name: brj_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2330469
num_examples: 3990
download_size: 812425
dataset_size: 2330469
- config_name: brj_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2369330
num_examples: 3996
download_size: 814814
dataset_size: 2369330
- config_name: brj_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1832237
num_examples: 3975
download_size: 692994
dataset_size: 1832237
- config_name: brj_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2252041
num_examples: 3996
download_size: 772826
dataset_size: 2252041
- config_name: brj_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2281832
num_examples: 3996
download_size: 791065
dataset_size: 2281832
- config_name: brj_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2698804
num_examples: 3989
download_size: 901102
dataset_size: 2698804
- config_name: brx_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2353728
num_examples: 3990
download_size: 820806
dataset_size: 2353728
- config_name: brx_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2374848
num_examples: 3999
download_size: 816110
dataset_size: 2374848
- config_name: brx_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1889008
num_examples: 3989
download_size: 710771
dataset_size: 1889008
- config_name: brx_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2358103
num_examples: 4000
download_size: 801010
dataset_size: 2358103
- config_name: brx_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2389305
num_examples: 3986
download_size: 835064
dataset_size: 2389305
- config_name: brx_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2640937
num_examples: 3980
download_size: 883127
dataset_size: 2640937
- config_name: dcc_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2556355
num_examples: 3988
download_size: 948450
dataset_size: 2556355
- config_name: dcc_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2553941
num_examples: 3996
download_size: 935137
dataset_size: 2553941
- config_name: dcc_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1839422
num_examples: 3977
download_size: 749237
dataset_size: 1839422
- config_name: dcc_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2347100
num_examples: 3998
download_size: 842519
dataset_size: 2347100
- config_name: dcc_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2384238
num_examples: 3927
download_size: 873503
dataset_size: 2384238
- config_name: dcc_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2769687
num_examples: 3998
download_size: 961925
dataset_size: 2769687
- config_name: doi_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2291267
num_examples: 3980
download_size: 793111
dataset_size: 2291267
- config_name: doi_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2368216
num_examples: 3997
download_size: 814116
dataset_size: 2368216
- config_name: doi_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1911452
num_examples: 3983
download_size: 725468
dataset_size: 1911452
- config_name: doi_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2378363
num_examples: 3996
download_size: 820833
dataset_size: 2378363
- config_name: doi_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2261085
num_examples: 3977
download_size: 776140
dataset_size: 2261085
- config_name: doi_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2699364
num_examples: 4000
download_size: 903815
dataset_size: 2699364
- config_name: en_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1979613
num_examples: 4071
download_size: 989806
dataset_size: 1979613
- config_name: gbm_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2309115
num_examples: 3980
download_size: 808166
dataset_size: 2309115
- config_name: gbm_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2367088
num_examples: 3998
download_size: 811013
dataset_size: 2367088
- config_name: gbm_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1860395
num_examples: 3973
download_size: 702858
dataset_size: 1860395
- config_name: gbm_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2338736
num_examples: 3998
download_size: 796695
dataset_size: 2338736
- config_name: gbm_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2257726
num_examples: 3984
download_size: 783924
dataset_size: 2257726
- config_name: gbm_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2729560
num_examples: 4000
download_size: 908345
dataset_size: 2729560
- config_name: gon_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2244942
num_examples: 3989
download_size: 784689
dataset_size: 2244942
- config_name: gon_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2234942
num_examples: 3995
download_size: 766690
dataset_size: 2234942
- config_name: gon_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1773143
num_examples: 3985
download_size: 671486
dataset_size: 1773143
- config_name: gon_en_mono
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2533264
num_examples: 10327
download_size: 649793
dataset_size: 2533264
- config_name: gon_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2225237
num_examples: 4000
download_size: 760136
dataset_size: 2225237
- config_name: gon_hi_mono
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 1723319
num_examples: 5878
download_size: 431629
dataset_size: 1723319
- config_name: gon_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2211085
num_examples: 3983
download_size: 769615
dataset_size: 2211085
- config_name: gon_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2655763
num_examples: 3989
download_size: 891299
dataset_size: 2655763
- config_name: grt_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2391924
num_examples: 3970
download_size: 1195962
dataset_size: 2391924
- config_name: grt_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2395769
num_examples: 3999
download_size: 1197884
dataset_size: 2395769
- config_name: grt_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1785666
num_examples: 3980
download_size: 892833
dataset_size: 1785666
- config_name: grt_en_mono
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 1479454
num_examples: 7625
download_size: 258637
dataset_size: 1479454
- config_name: grt_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2345635
num_examples: 4000
download_size: 1172817
dataset_size: 2345635
- config_name: grt_hi_mono
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 1027102
num_examples: 4446
download_size: 186294
dataset_size: 1027102
- config_name: grt_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2386544
num_examples: 3978
download_size: 1193272
dataset_size: 2386544
- config_name: grt_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2676317
num_examples: 4000
download_size: 1338158
dataset_size: 2676317
- config_name: gu_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2911804
num_examples: 11225
download_size: 700986
dataset_size: 2911804
- config_name: gu_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 3391427
num_examples: 11221
download_size: 785091
dataset_size: 3391427
- config_name: gu_ks_arab
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2548484
num_examples: 3979
download_size: 1274242
dataset_size: 2548484
- config_name: gu_sa
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2109390
num_examples: 4114
download_size: 1054695
dataset_size: 2109390
- config_name: hi_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 1562921
num_examples: 5259
download_size: 369698
dataset_size: 1562921
- config_name: hi_mni
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2368266
num_examples: 3987
download_size: 1184133
dataset_size: 2368266
- config_name: ho_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2432602
num_examples: 10649
download_size: 593590
dataset_size: 2432602
- config_name: ho_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 1567518
num_examples: 5810
download_size: 368657
dataset_size: 1567518
- config_name: hoj_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2263451
num_examples: 3989
download_size: 786403
dataset_size: 2263451
- config_name: hoj_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2295609
num_examples: 3999
download_size: 782224
dataset_size: 2295609
- config_name: hoj_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1798869
num_examples: 3969
download_size: 675063
dataset_size: 1798869
- config_name: hoj_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2242603
num_examples: 3994
download_size: 767709
dataset_size: 2242603
- config_name: hoj_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2250021
num_examples: 3997
download_size: 777729
dataset_size: 2250021
- config_name: hoj_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2710733
num_examples: 3999
download_size: 903893
dataset_size: 2710733
- config_name: kas_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2075836
num_examples: 8804
download_size: 558572
dataset_size: 2075836
- config_name: kas_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2547301
num_examples: 8729
download_size: 654258
dataset_size: 2547301
- config_name: kfa_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2521463
num_examples: 3990
download_size: 871275
dataset_size: 2521463
- config_name: kfa_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2544842
num_examples: 3990
download_size: 863483
dataset_size: 2544842
- config_name: kfa_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2036761
num_examples: 3987
download_size: 754397
dataset_size: 2036761
- config_name: kfa_en_mono
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 3229145
num_examples: 12807
download_size: 680642
dataset_size: 3229145
- config_name: kfa_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2561647
num_examples: 3990
download_size: 863531
dataset_size: 2561647
- config_name: kfa_hi_mono
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 3862349
num_examples: 13170
download_size: 784443
dataset_size: 3862349
- config_name: kfa_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2481031
num_examples: 3969
download_size: 849671
dataset_size: 2481031
- config_name: kfa_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2896347
num_examples: 3990
download_size: 962210
dataset_size: 2896347
- config_name: kfr_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2419341
num_examples: 3989
download_size: 842491
dataset_size: 2419341
- config_name: kfr_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2395010
num_examples: 3996
download_size: 820449
dataset_size: 2395010
- config_name: kfr_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1898930
num_examples: 3975
download_size: 717624
dataset_size: 1898930
- config_name: kfr_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2396417
num_examples: 3998
download_size: 812628
dataset_size: 2396417
- config_name: kfr_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2387696
num_examples: 3977
download_size: 831426
dataset_size: 2387696
- config_name: kfr_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2770802
num_examples: 3990
download_size: 925456
dataset_size: 2770802
- config_name: kfy_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2270956
num_examples: 3999
download_size: 798586
dataset_size: 2270956
- config_name: kfy_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2301773
num_examples: 3998
download_size: 795252
dataset_size: 2301773
- config_name: kfy_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1855089
num_examples: 3985
download_size: 705120
dataset_size: 1855089
- config_name: kfy_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2304251
num_examples: 3996
download_size: 792364
dataset_size: 2304251
- config_name: kfy_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2234963
num_examples: 3964
download_size: 782452
dataset_size: 2234963
- config_name: kfy_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2676326
num_examples: 3989
download_size: 900308
dataset_size: 2676326
- config_name: kha_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 773050
num_examples: 4150
download_size: 137838
dataset_size: 773050
- config_name: kha_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 393554
num_examples: 1792
download_size: 73895
dataset_size: 393554
- config_name: kho_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2917474
num_examples: 10025
download_size: 781359
dataset_size: 2917474
- config_name: kho_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2815905
num_examples: 7747
download_size: 719155
dataset_size: 2815905
- config_name: kht_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2307739
num_examples: 3989
download_size: 793868
dataset_size: 2307739
- config_name: kht_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2336183
num_examples: 3996
download_size: 789441
dataset_size: 2336183
- config_name: kht_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1826440
num_examples: 3976
download_size: 676110
dataset_size: 1826440
- config_name: kht_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2308644
num_examples: 3997
download_size: 773948
dataset_size: 2308644
- config_name: kht_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2286281
num_examples: 3996
download_size: 782481
dataset_size: 2286281
- config_name: kht_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2715710
num_examples: 3990
download_size: 893300
dataset_size: 2715710
- config_name: kn_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2757515
num_examples: 10451
download_size: 710637
dataset_size: 2757515
- config_name: kn_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 3323179
num_examples: 10615
download_size: 829675
dataset_size: 3323179
- config_name: kok_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2487009
num_examples: 3989
download_size: 866029
dataset_size: 2487009
- config_name: kok_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2527279
num_examples: 4000
download_size: 859267
dataset_size: 2527279
- config_name: kok_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1984683
num_examples: 3982
download_size: 749205
dataset_size: 1984683
- config_name: kok_en_mono
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2118249
num_examples: 10036
download_size: 474865
dataset_size: 2118249
- config_name: kok_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2516103
num_examples: 3997
download_size: 858948
dataset_size: 2516103
- config_name: kok_hi_mono
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2408048
num_examples: 10028
download_size: 526172
dataset_size: 2408048
- config_name: kok_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2389435
num_examples: 3857
download_size: 830090
dataset_size: 2389435
- config_name: kok_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2866469
num_examples: 3988
download_size: 954998
dataset_size: 2866469
- config_name: kru_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2308504
num_examples: 4000
download_size: 809269
dataset_size: 2308504
- config_name: kru_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2303180
num_examples: 3997
download_size: 796326
dataset_size: 2303180
- config_name: kru_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1801008
num_examples: 3986
download_size: 675410
dataset_size: 1801008
- config_name: kru_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2272474
num_examples: 3998
download_size: 774180
dataset_size: 2272474
- config_name: kru_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2241491
num_examples: 3992
download_size: 780541
dataset_size: 2241491
- config_name: kru_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2693763
num_examples: 3989
download_size: 900756
dataset_size: 2693763
- config_name: ks_arab_mni
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2134916
num_examples: 4000
download_size: 1067458
dataset_size: 2134916
- config_name: ks_deva_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2793871
num_examples: 3970
download_size: 1396935
dataset_size: 2793871
- config_name: ks_deva_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2760728
num_examples: 3969
download_size: 1380364
dataset_size: 2760728
- config_name: ks_deva_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2297599
num_examples: 3989
download_size: 1148799
dataset_size: 2297599
- config_name: ks_deva_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2721734
num_examples: 3999
download_size: 1360867
dataset_size: 2721734
- config_name: ks_deva_ks_arab
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2441002
num_examples: 4000
download_size: 1220501
dataset_size: 2441002
- config_name: ks_deva_mai
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2613850
num_examples: 3940
download_size: 1306925
dataset_size: 2613850
- config_name: ks_deva_ml
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 3093639
num_examples: 3980
download_size: 1546819
dataset_size: 3093639
- config_name: ks_deva_mni
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2362445
num_examples: 4000
download_size: 1181222
dataset_size: 2362445
- config_name: ks_deva_ne
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2787124
num_examples: 4000
download_size: 1393562
dataset_size: 2787124
- config_name: ks_deva_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2771152
num_examples: 3996
download_size: 1385576
dataset_size: 2771152
- config_name: ks_deva_pa
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2732716
num_examples: 3979
download_size: 1366358
dataset_size: 2732716
- config_name: ks_deva_sa
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2651730
num_examples: 3970
download_size: 1325865
dataset_size: 2651730
- config_name: ks_deva_sd
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2373171
num_examples: 3999
download_size: 1186585
dataset_size: 2373171
- config_name: ks_deva_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 3102290
num_examples: 4000
download_size: 1551145
dataset_size: 3102290
- config_name: lmn_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2251371
num_examples: 3999
download_size: 786041
dataset_size: 2251371
- config_name: lmn_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2272048
num_examples: 3994
download_size: 782366
dataset_size: 2272048
- config_name: lmn_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1775784
num_examples: 3984
download_size: 675950
dataset_size: 1775784
- config_name: lmn_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2229017
num_examples: 3995
download_size: 765063
dataset_size: 2229017
- config_name: lmn_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2218507
num_examples: 3994
download_size: 770199
dataset_size: 2218507
- config_name: lmn_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2637588
num_examples: 3990
download_size: 883457
dataset_size: 2637588
- config_name: mag_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 1830492
num_examples: 7302
download_size: 456935
dataset_size: 1830492
- config_name: mag_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2131921
num_examples: 7301
download_size: 506944
dataset_size: 2131921
- config_name: mai_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2626254
num_examples: 10926
download_size: 608500
dataset_size: 2626254
- config_name: mai_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 3077557
num_examples: 10924
download_size: 674233
dataset_size: 3077557
- config_name: mai_ks_arab
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2443703
num_examples: 4000
download_size: 1221851
dataset_size: 2443703
- config_name: miz_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 924567
num_examples: 3965
download_size: 205377
dataset_size: 924567
- config_name: miz_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 495027
num_examples: 1276
download_size: 121736
dataset_size: 495027
- config_name: ml_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 3263498
num_examples: 11745
download_size: 787083
dataset_size: 3263498
- config_name: ml_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 4067204
num_examples: 12513
download_size: 941824
dataset_size: 4067204
- config_name: mni_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2539966
num_examples: 3980
download_size: 1269983
dataset_size: 2539966
- config_name: mni_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2526335
num_examples: 4000
download_size: 1263167
dataset_size: 2526335
- config_name: mni_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1991461
num_examples: 3990
download_size: 995730
dataset_size: 1991461
- config_name: mni_en_mono
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2236558
num_examples: 10799
download_size: 469191
dataset_size: 2236558
- config_name: mni_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2533994
num_examples: 3989
download_size: 1266997
dataset_size: 2533994
- config_name: mni_hi_mono
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2555141
num_examples: 10819
download_size: 532066
dataset_size: 2555141
- config_name: mni_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2464535
num_examples: 3984
download_size: 1232267
dataset_size: 2464535
- config_name: mni_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2836212
num_examples: 3960
download_size: 1418106
dataset_size: 2836212
- config_name: mr_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 3097986
num_examples: 12967
download_size: 725408
dataset_size: 3097986
- config_name: mr_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 3640685
num_examples: 12966
download_size: 827778
dataset_size: 3640685
- config_name: mtr_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2232551
num_examples: 3999
download_size: 779843
dataset_size: 2232551
- config_name: mtr_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2256792
num_examples: 3998
download_size: 773735
dataset_size: 2256792
- config_name: mtr_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1809006
num_examples: 3982
download_size: 683181
dataset_size: 1809006
- config_name: mtr_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2263065
num_examples: 3998
download_size: 774297
dataset_size: 2263065
- config_name: mtr_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2212740
num_examples: 3998
download_size: 770090
dataset_size: 2212740
- config_name: mtr_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2638318
num_examples: 3990
download_size: 878832
dataset_size: 2638318
- config_name: mun_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2594740
num_examples: 10864
download_size: 640212
dataset_size: 2594740
- config_name: mun_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2243787
num_examples: 7869
download_size: 538392
dataset_size: 2243787
- config_name: mwr_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2281300
num_examples: 3990
download_size: 790697
dataset_size: 2281300
- config_name: mwr_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2331445
num_examples: 3998
download_size: 797471
dataset_size: 2331445
- config_name: mwr_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1859964
num_examples: 3982
download_size: 702909
dataset_size: 1859964
- config_name: mwr_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2309900
num_examples: 3999
download_size: 791694
dataset_size: 2309900
- config_name: mwr_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2281862
num_examples: 3994
download_size: 786466
dataset_size: 2281862
- config_name: mwr_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2713178
num_examples: 4000
download_size: 902686
dataset_size: 2713178
- config_name: ne_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2625401
num_examples: 11631
download_size: 576023
dataset_size: 2625401
- config_name: ne_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 3072980
num_examples: 11627
download_size: 645696
dataset_size: 3072980
- config_name: noe_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2298058
num_examples: 3989
download_size: 800936
dataset_size: 2298058
- config_name: noe_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2358521
num_examples: 3999
download_size: 805588
dataset_size: 2358521
- config_name: noe_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1839757
num_examples: 3968
download_size: 688914
dataset_size: 1839757
- config_name: noe_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2293837
num_examples: 3993
download_size: 784002
dataset_size: 2293837
- config_name: noe_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2297432
num_examples: 3997
download_size: 794126
dataset_size: 2297432
- config_name: noe_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2713879
num_examples: 3970
download_size: 904553
dataset_size: 2713879
- config_name: or_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2126356
num_examples: 6659
download_size: 613592
dataset_size: 2126356
- config_name: or_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2611428
num_examples: 6649
download_size: 712049
dataset_size: 2611428
- config_name: pa_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 4063162
num_examples: 15887
download_size: 934282
dataset_size: 4063162
- config_name: pa_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 4761433
num_examples: 15868
download_size: 1046907
dataset_size: 4761433
- config_name: phr_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2410760
num_examples: 4000
download_size: 856749
dataset_size: 2410760
- config_name: phr_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2436355
num_examples: 3999
download_size: 852128
dataset_size: 2436355
- config_name: phr_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1906909
num_examples: 3979
download_size: 730968
dataset_size: 1906909
- config_name: phr_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2378425
num_examples: 3995
download_size: 830122
dataset_size: 2378425
- config_name: phr_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2387245
num_examples: 3996
download_size: 841262
dataset_size: 2387245
- config_name: phr_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2805793
num_examples: 3999
download_size: 950919
dataset_size: 2805793
- config_name: raj_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2231781
num_examples: 3970
download_size: 771655
dataset_size: 2231781
- config_name: raj_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2290128
num_examples: 3998
download_size: 775946
dataset_size: 2290128
- config_name: raj_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1854639
num_examples: 3981
download_size: 695853
dataset_size: 1854639
- config_name: raj_en_mono
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2496495
num_examples: 10663
download_size: 568018
dataset_size: 2496495
- config_name: raj_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2313394
num_examples: 3998
download_size: 785254
dataset_size: 2313394
- config_name: raj_hi_mono
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2650864
num_examples: 9893
download_size: 574390
dataset_size: 2650864
- config_name: raj_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2211588
num_examples: 3997
download_size: 769390
dataset_size: 2211588
- config_name: raj_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2659606
num_examples: 3969
download_size: 880891
dataset_size: 2659606
- config_name: sa_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2806261
num_examples: 11677
download_size: 683901
dataset_size: 2806261
- config_name: sa_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 3204010
num_examples: 11669
download_size: 741909
dataset_size: 3204010
- config_name: sat_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2438824
num_examples: 3990
download_size: 831776
dataset_size: 2438824
- config_name: sat_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2416845
num_examples: 3989
download_size: 812669
dataset_size: 2416845
- config_name: sat_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1876397
num_examples: 3987
download_size: 684497
dataset_size: 1876397
- config_name: sat_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2413381
num_examples: 3999
download_size: 801956
dataset_size: 2413381
- config_name: sat_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2407540
num_examples: 3998
download_size: 815012
dataset_size: 2407540
- config_name: sat_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2751018
num_examples: 3990
download_size: 901912
dataset_size: 2751018
- config_name: sd_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 1377222
num_examples: 5591
download_size: 392479
dataset_size: 1377222
- config_name: sd_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 1706710
num_examples: 5588
download_size: 460329
dataset_size: 1706710
- config_name: sgj_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2292519
num_examples: 3999
download_size: 791933
dataset_size: 2292519
- config_name: sgj_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2310053
num_examples: 3998
download_size: 786500
dataset_size: 2310053
- config_name: sgj_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1831491
num_examples: 3972
download_size: 683180
dataset_size: 1831491
- config_name: sgj_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2272570
num_examples: 3998
download_size: 769141
dataset_size: 2272570
- config_name: sgj_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2271291
num_examples: 3999
download_size: 782447
dataset_size: 2271291
- config_name: sgj_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2685583
num_examples: 3999
download_size: 886108
dataset_size: 2685583
- config_name: sor_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2932800
num_examples: 11698
download_size: 732407
dataset_size: 2932800
- config_name: sor_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 1794448
num_examples: 5967
download_size: 428191
dataset_size: 1794448
- config_name: spv_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2292689
num_examples: 3999
download_size: 826936
dataset_size: 2292689
- config_name: spv_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2380851
num_examples: 3999
download_size: 811288
dataset_size: 2380851
- config_name: spv_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1888439
num_examples: 3982
download_size: 711314
dataset_size: 1888439
- config_name: spv_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2417526
num_examples: 3987
download_size: 820326
dataset_size: 2417526
- config_name: spv_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2331084
num_examples: 3955
download_size: 804933
dataset_size: 2331084
- config_name: spv_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2763569
num_examples: 3998
download_size: 922312
dataset_size: 2763569
- config_name: ta_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 8508825
num_examples: 34547
download_size: 1736478
dataset_size: 8508825
- config_name: ta_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 13255911
num_examples: 44355
download_size: 2438327
dataset_size: 13255911
- config_name: tcy_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2528820
num_examples: 3970
download_size: 876191
dataset_size: 2528820
- config_name: tcy_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2556536
num_examples: 4000
download_size: 866348
dataset_size: 2556536
- config_name: tcy_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2058955
num_examples: 3970
download_size: 769283
dataset_size: 2058955
- config_name: tcy_en_mono
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2301260
num_examples: 9384
download_size: 579582
dataset_size: 2301260
- config_name: tcy_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2572357
num_examples: 4000
download_size: 869647
dataset_size: 2572357
- config_name: tcy_hi_mono
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 2812164
num_examples: 9625
download_size: 682871
dataset_size: 2812164
- config_name: tcy_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2458865
num_examples: 3955
download_size: 852444
dataset_size: 2458865
- config_name: tcy_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2864562
num_examples: 3990
download_size: 960551
dataset_size: 2864562
- config_name: te_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 3155576
num_examples: 12272
download_size: 761930
dataset_size: 3155576
- config_name: te_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 3765354
num_examples: 12270
download_size: 883546
dataset_size: 3765354
- config_name: ur_en
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 1230932
num_examples: 4492
download_size: 376179
dataset_size: 1230932
- config_name: ur_hi
features:
- name: source_language
dtype: large_string
- name: target_language
dtype: large_string
- name: source_text
dtype: large_string
- name: target_text
dtype: large_string
- name: domain
dtype: large_string
- name: subtopic
dtype: large_string
- name: generated_at
dtype: large_string
splits:
- name: train
num_bytes: 1573681
num_examples: 4472
download_size: 433003
dataset_size: 1573681
- config_name: wbr_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2335374
num_examples: 4000
download_size: 815163
dataset_size: 2335374
- config_name: wbr_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2340959
num_examples: 3998
download_size: 804243
dataset_size: 2340959
- config_name: wbr_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1815957
num_examples: 3972
download_size: 689460
dataset_size: 1815957
- config_name: wbr_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2327284
num_examples: 3993
download_size: 798868
dataset_size: 2327284
- config_name: wbr_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2284719
num_examples: 3996
download_size: 792008
dataset_size: 2284719
- config_name: wbr_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2719413
num_examples: 3998
download_size: 911295
dataset_size: 2719413
- config_name: xnr_as
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2425918
num_examples: 3999
download_size: 857496
dataset_size: 2425918
- config_name: xnr_bn
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2392169
num_examples: 3995
download_size: 832226
dataset_size: 2392169
- config_name: xnr_en
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 1862620
num_examples: 3976
download_size: 715549
dataset_size: 1862620
- config_name: xnr_hi
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2358470
num_examples: 3997
download_size: 816804
dataset_size: 2358470
- config_name: xnr_or
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2344044
num_examples: 3995
download_size: 822545
dataset_size: 2344044
- config_name: xnr_ta
features:
- name: source_language
dtype: string
- name: target_language
dtype: string
- name: source_text
dtype: string
- name: target_text
dtype: string
- name: domain
dtype: string
- name: subtopic
dtype: string
- name: generated_at
dtype: string
splits:
- name: train
num_bytes: 2725408
num_examples: 3990
download_size: 919099
dataset_size: 2725408
configs:
- config_name: ahr_as
data_files:
- split: train
path: ahr_as/train-*
- config_name: ahr_bn
data_files:
- split: train
path: ahr_bn/train-*
- config_name: ahr_en
data_files:
- split: train
path: ahr_en/train-*
- config_name: ahr_hi
data_files:
- split: train
path: ahr_hi/train-*
- config_name: ahr_or
data_files:
- split: train
path: ahr_or/train-*
- config_name: ahr_ta
data_files:
- split: train
path: ahr_ta/train-*
- config_name: as_en
data_files:
- split: train
path: as_en/train-*.parquet
- config_name: as_en_mono
data_files:
- split: train
path: as_en_mono/train-*
- config_name: as_gu
data_files:
- split: train
path: as_gu/train-*.parquet
- config_name: as_hi
data_files:
- split: train
path: as_hi/train-*
- config_name: as_mni
data_files:
- split: train
path: as_mni/train-*.parquet
- config_name: as_sa
data_files:
- split: train
path: as_sa/train-*.parquet
- config_name: as_sd
data_files:
- split: train
path: as_sd/train-*.parquet
- config_name: as_ta
data_files:
- split: train
path: as_ta/train-*.parquet
- config_name: awa_en
data_files:
- split: train
path: awa_en/train-*
- config_name: awa_hi
data_files:
- split: train
path: awa_hi/train-*
- config_name: bfy_as
data_files:
- split: train
path: bfy_as/train-*
- config_name: bfy_bn
data_files:
- split: train
path: bfy_bn/train-*
- config_name: bfy_en
data_files:
- split: train
path: bfy_en/train-*
- config_name: bfy_hi
data_files:
- split: train
path: bfy_hi/train-*
- config_name: bfy_or
data_files:
- split: train
path: bfy_or/train-*
- config_name: bfy_ta
data_files:
- split: train
path: bfy_ta/train-*
- config_name: bfz_as
data_files:
- split: train
path: bfz_as/train-*
- config_name: bfz_bn
data_files:
- split: train
path: bfz_bn/train-*
- config_name: bfz_en
data_files:
- split: train
path: bfz_en/train-*
- config_name: bfz_hi
data_files:
- split: train
path: bfz_hi/train-*
- config_name: bfz_or
data_files:
- split: train
path: bfz_or/train-*
- config_name: bfz_ta
data_files:
- split: train
path: bfz_ta/train-*
- config_name: bgc_as
data_files:
- split: train
path: bgc_as/train-*
- config_name: bgc_bn
data_files:
- split: train
path: bgc_bn/train-*
- config_name: bgc_en
data_files:
- split: train
path: bgc_en/train-*
- config_name: bgc_hi
data_files:
- split: train
path: bgc_hi/train-*
- config_name: bgc_or
data_files:
- split: train
path: bgc_or/train-*
- config_name: bgc_ta
data_files:
- split: train
path: bgc_ta/train-*
- config_name: bgq_as
data_files:
- split: train
path: bgq_as/train-*
- config_name: bgq_bn
data_files:
- split: train
path: bgq_bn/train-*
- config_name: bgq_en
data_files:
- split: train
path: bgq_en/train-*
- config_name: bgq_hi
data_files:
- split: train
path: bgq_hi/train-*
- config_name: bgq_or
data_files:
- split: train
path: bgq_or/train-*
- config_name: bgq_ta
data_files:
- split: train
path: bgq_ta/train-*
- config_name: bhb_as
data_files:
- split: train
path: bhb_as/train-*
- config_name: bhb_bn
data_files:
- split: train
path: bhb_bn/train-*
- config_name: bhb_en
data_files:
- split: train
path: bhb_en/train-*
- config_name: bhb_hi
data_files:
- split: train
path: bhb_hi/train-*
- config_name: bhb_or
data_files:
- split: train
path: bhb_or/train-*
- config_name: bhb_ta
data_files:
- split: train
path: bhb_ta/train-*
- config_name: bho_en
data_files:
- split: train
path: bho_en/train-*
- config_name: bho_hi
data_files:
- split: train
path: bho_hi/train-*
- config_name: bn_en
data_files:
- split: train
path: bn_en/train-*
- config_name: bn_hi
data_files:
- split: train
path: bn_hi/train-*
- config_name: bn_ks_deva
data_files:
- split: train
path: bn_ks_deva/train-*.parquet
- config_name: bn_ml
data_files:
- split: train
path: bn_ml/train-*.parquet
- config_name: bn_mni
data_files:
- split: train
path: bn_mni/train-*.parquet
- config_name: bns_as
data_files:
- split: train
path: bns_as/train-*
- config_name: bns_bn
data_files:
- split: train
path: bns_bn/train-*
- config_name: bns_en
data_files:
- split: train
path: bns_en/train-*
- config_name: bns_hi
data_files:
- split: train
path: bns_hi/train-*
- config_name: bns_or
data_files:
- split: train
path: bns_or/train-*
- config_name: bns_ta
data_files:
- split: train
path: bns_ta/train-*
- config_name: bra_as
data_files:
- split: train
path: bra_as/train-*
- config_name: bra_bn
data_files:
- split: train
path: bra_bn/train-*
- config_name: bra_en
data_files:
- split: train
path: bra_en/train-*
- config_name: bra_hi
data_files:
- split: train
path: bra_hi/train-*
- config_name: bra_or
data_files:
- split: train
path: bra_or/train-*
- config_name: bra_ta
data_files:
- split: train
path: bra_ta/train-*
- config_name: brj_as
data_files:
- split: train
path: brj_as/train-*
- config_name: brj_bn
data_files:
- split: train
path: brj_bn/train-*
- config_name: brj_en
data_files:
- split: train
path: brj_en/train-*
- config_name: brj_hi
data_files:
- split: train
path: brj_hi/train-*
- config_name: brj_or
data_files:
- split: train
path: brj_or/train-*
- config_name: brj_ta
data_files:
- split: train
path: brj_ta/train-*
- config_name: brx_as
data_files:
- split: train
path: brx_as/train-*
- config_name: brx_bn
data_files:
- split: train
path: brx_bn/train-*
- config_name: brx_en
data_files:
- split: train
path: brx_en/train-*
- config_name: brx_hi
data_files:
- split: train
path: brx_hi/train-*
- config_name: brx_or
data_files:
- split: train
path: brx_or/train-*
- config_name: brx_ta
data_files:
- split: train
path: brx_ta/train-*
- config_name: dcc_as
data_files:
- split: train
path: dcc_as/train-*
- config_name: dcc_bn
data_files:
- split: train
path: dcc_bn/train-*
- config_name: dcc_en
data_files:
- split: train
path: dcc_en/train-*
- config_name: dcc_hi
data_files:
- split: train
path: dcc_hi/train-*
- config_name: dcc_or
data_files:
- split: train
path: dcc_or/train-*
- config_name: dcc_ta
data_files:
- split: train
path: dcc_ta/train-*
- config_name: doi_as
data_files:
- split: train
path: doi_as/train-*
- config_name: doi_bn
data_files:
- split: train
path: doi_bn/train-*
- config_name: doi_en
data_files:
- split: train
path: doi_en/train-*
- config_name: doi_hi
data_files:
- split: train
path: doi_hi/train-*
- config_name: doi_or
data_files:
- split: train
path: doi_or/train-*
- config_name: doi_ta
data_files:
- split: train
path: doi_ta/train-*
- config_name: en_as
data_files:
- split: train
path: en_as/train-*.parquet
- config_name: gbm_as
data_files:
- split: train
path: gbm_as/train-*
- config_name: gbm_bn
data_files:
- split: train
path: gbm_bn/train-*
- config_name: gbm_en
data_files:
- split: train
path: gbm_en/train-*
- config_name: gbm_hi
data_files:
- split: train
path: gbm_hi/train-*
- config_name: gbm_or
data_files:
- split: train
path: gbm_or/train-*
- config_name: gbm_ta
data_files:
- split: train
path: gbm_ta/train-*
- config_name: gon_as
data_files:
- split: train
path: gon_as/train-*
- config_name: gon_bn
data_files:
- split: train
path: gon_bn/train-*
- config_name: gon_en
data_files:
- split: train
path: gon_en/train-*
- config_name: gon_en_mono
data_files:
- split: train
path: gon_en_mono/train-*
- config_name: gon_hi
data_files:
- split: train
path: gon_hi/train-*
- config_name: gon_hi_mono
data_files:
- split: train
path: gon_hi_mono/train-*
- config_name: gon_or
data_files:
- split: train
path: gon_or/train-*
- config_name: gon_ta
data_files:
- split: train
path: gon_ta/train-*
- config_name: grt_as
data_files:
- split: train
path: grt_as/train-*.parquet
- config_name: grt_bn
data_files:
- split: train
path: grt_bn/train-*.parquet
- config_name: grt_en
data_files:
- split: train
path: grt_en/train-*.parquet
- config_name: grt_en_mono
data_files:
- split: train
path: grt_en_mono/train-*
- config_name: grt_hi
data_files:
- split: train
path: grt_hi/train-*.parquet
- config_name: grt_hi_mono
data_files:
- split: train
path: grt_hi_mono/train-*
- config_name: grt_or
data_files:
- split: train
path: grt_or/train-*.parquet
- config_name: grt_ta
data_files:
- split: train
path: grt_ta/train-*.parquet
- config_name: gu_en
data_files:
- split: train
path: gu_en/train-*
- config_name: gu_hi
data_files:
- split: train
path: gu_hi/train-*
- config_name: gu_ks_arab
data_files:
- split: train
path: gu_ks_arab/train-*.parquet
- config_name: gu_sa
data_files:
- split: train
path: gu_sa/train-*.parquet
- config_name: hi_en
data_files:
- split: train
path: hi_en/train-*
- config_name: hi_mni
data_files:
- split: train
path: hi_mni/train-*.parquet
- config_name: ho_en
data_files:
- split: train
path: ho_en/train-*
- config_name: ho_hi
data_files:
- split: train
path: ho_hi/train-*
- config_name: hoj_as
data_files:
- split: train
path: hoj_as/train-*
- config_name: hoj_bn
data_files:
- split: train
path: hoj_bn/train-*
- config_name: hoj_en
data_files:
- split: train
path: hoj_en/train-*
- config_name: hoj_hi
data_files:
- split: train
path: hoj_hi/train-*
- config_name: hoj_or
data_files:
- split: train
path: hoj_or/train-*
- config_name: hoj_ta
data_files:
- split: train
path: hoj_ta/train-*
- config_name: kas_en
data_files:
- split: train
path: kas_en/train-*
- config_name: kas_hi
data_files:
- split: train
path: kas_hi/train-*
- config_name: kfa_as
data_files:
- split: train
path: kfa_as/train-*
- config_name: kfa_bn
data_files:
- split: train
path: kfa_bn/train-*
- config_name: kfa_en
data_files:
- split: train
path: kfa_en/train-*
- config_name: kfa_en_mono
data_files:
- split: train
path: kfa_en_mono/train-*
- config_name: kfa_hi
data_files:
- split: train
path: kfa_hi/train-*
- config_name: kfa_hi_mono
data_files:
- split: train
path: kfa_hi_mono/train-*
- config_name: kfa_or
data_files:
- split: train
path: kfa_or/train-*
- config_name: kfa_ta
data_files:
- split: train
path: kfa_ta/train-*
- config_name: kfr_as
data_files:
- split: train
path: kfr_as/train-*
- config_name: kfr_bn
data_files:
- split: train
path: kfr_bn/train-*
- config_name: kfr_en
data_files:
- split: train
path: kfr_en/train-*
- config_name: kfr_hi
data_files:
- split: train
path: kfr_hi/train-*
- config_name: kfr_or
data_files:
- split: train
path: kfr_or/train-*
- config_name: kfr_ta
data_files:
- split: train
path: kfr_ta/train-*
- config_name: kfy_as
data_files:
- split: train
path: kfy_as/train-*
- config_name: kfy_bn
data_files:
- split: train
path: kfy_bn/train-*
- config_name: kfy_en
data_files:
- split: train
path: kfy_en/train-*
- config_name: kfy_hi
data_files:
- split: train
path: kfy_hi/train-*
- config_name: kfy_or
data_files:
- split: train
path: kfy_or/train-*
- config_name: kfy_ta
data_files:
- split: train
path: kfy_ta/train-*
- config_name: kha_en
data_files:
- split: train
path: kha_en/train-*
- config_name: kha_hi
data_files:
- split: train
path: kha_hi/train-*
- config_name: kho_en
data_files:
- split: train
path: kho_en/train-*
- config_name: kho_hi
data_files:
- split: train
path: kho_hi/train-*
- config_name: kht_as
data_files:
- split: train
path: kht_as/train-*
- config_name: kht_bn
data_files:
- split: train
path: kht_bn/train-*
- config_name: kht_en
data_files:
- split: train
path: kht_en/train-*
- config_name: kht_hi
data_files:
- split: train
path: kht_hi/train-*
- config_name: kht_or
data_files:
- split: train
path: kht_or/train-*
- config_name: kht_ta
data_files:
- split: train
path: kht_ta/train-*
- config_name: kn_en
data_files:
- split: train
path: kn_en/train-*
- config_name: kn_hi
data_files:
- split: train
path: kn_hi/train-*
- config_name: kok_as
data_files:
- split: train
path: kok_as/train-*
- config_name: kok_bn
data_files:
- split: train
path: kok_bn/train-*
- config_name: kok_en
data_files:
- split: train
path: kok_en/train-*
- config_name: kok_en_mono
data_files:
- split: train
path: kok_en_mono/train-*
- config_name: kok_hi
data_files:
- split: train
path: kok_hi/train-*
- config_name: kok_hi_mono
data_files:
- split: train
path: kok_hi_mono/train-*
- config_name: kok_or
data_files:
- split: train
path: kok_or/train-*
- config_name: kok_ta
data_files:
- split: train
path: kok_ta/train-*
- config_name: kru_as
data_files:
- split: train
path: kru_as/train-*
- config_name: kru_bn
data_files:
- split: train
path: kru_bn/train-*
- config_name: kru_en
data_files:
- split: train
path: kru_en/train-*
- config_name: kru_hi
data_files:
- split: train
path: kru_hi/train-*
- config_name: kru_or
data_files:
- split: train
path: kru_or/train-*
- config_name: kru_ta
data_files:
- split: train
path: kru_ta/train-*
- config_name: ks_arab_mni
data_files:
- split: train
path: ks_arab_mni/train-*.parquet
- config_name: ks_deva_as
data_files:
- split: train
path: ks_deva_as/train-*.parquet
- config_name: ks_deva_bn
data_files:
- split: train
path: ks_deva_bn/train-*.parquet
- config_name: ks_deva_en
data_files:
- split: train
path: ks_deva_en/train-*.parquet
- config_name: ks_deva_hi
data_files:
- split: train
path: ks_deva_hi/train-*.parquet
- config_name: ks_deva_ks_arab
data_files:
- split: train
path: ks_deva_ks_arab/train-*.parquet
- config_name: ks_deva_mai
data_files:
- split: train
path: ks_deva_mai/train-*.parquet
- config_name: ks_deva_ml
data_files:
- split: train
path: ks_deva_ml/train-*.parquet
- config_name: ks_deva_mni
data_files:
- split: train
path: ks_deva_mni/train-*.parquet
- config_name: ks_deva_ne
data_files:
- split: train
path: ks_deva_ne/train-*.parquet
- config_name: ks_deva_or
data_files:
- split: train
path: ks_deva_or/train-*.parquet
- config_name: ks_deva_pa
data_files:
- split: train
path: ks_deva_pa/train-*.parquet
- config_name: ks_deva_sa
data_files:
- split: train
path: ks_deva_sa/train-*.parquet
- config_name: ks_deva_sd
data_files:
- split: train
path: ks_deva_sd/train-*.parquet
- config_name: ks_deva_ta
data_files:
- split: train
path: ks_deva_ta/train-*.parquet
- config_name: lmn_as
data_files:
- split: train
path: lmn_as/train-*
- config_name: lmn_bn
data_files:
- split: train
path: lmn_bn/train-*
- config_name: lmn_en
data_files:
- split: train
path: lmn_en/train-*
- config_name: lmn_hi
data_files:
- split: train
path: lmn_hi/train-*
- config_name: lmn_or
data_files:
- split: train
path: lmn_or/train-*
- config_name: lmn_ta
data_files:
- split: train
path: lmn_ta/train-*
- config_name: mag_en
data_files:
- split: train
path: mag_en/train-*
- config_name: mag_hi
data_files:
- split: train
path: mag_hi/train-*
- config_name: mai_en
data_files:
- split: train
path: mai_en/train-*
- config_name: mai_hi
data_files:
- split: train
path: mai_hi/train-*
- config_name: mai_ks_arab
data_files:
- split: train
path: mai_ks_arab/train-*.parquet
- config_name: miz_en
data_files:
- split: train
path: miz_en/train-*
- config_name: miz_hi
data_files:
- split: train
path: miz_hi/train-*
- config_name: ml_en
data_files:
- split: train
path: ml_en/train-*
- config_name: ml_hi
data_files:
- split: train
path: ml_hi/train-*
- config_name: mni_as
data_files:
- split: train
path: mni_as/train-*.parquet
- config_name: mni_bn
data_files:
- split: train
path: mni_bn/train-*.parquet
- config_name: mni_en
data_files:
- split: train
path: mni_en/train-*.parquet
- config_name: mni_en_mono
data_files:
- split: train
path: mni_en_mono/train-*
- config_name: mni_hi
data_files:
- split: train
path: mni_hi/train-*.parquet
- config_name: mni_hi_mono
data_files:
- split: train
path: mni_hi_mono/train-*
- config_name: mni_or
data_files:
- split: train
path: mni_or/train-*.parquet
- config_name: mni_ta
data_files:
- split: train
path: mni_ta/train-*.parquet
- config_name: mr_en
data_files:
- split: train
path: mr_en/train-*
- config_name: mr_hi
data_files:
- split: train
path: mr_hi/train-*
- config_name: mtr_as
data_files:
- split: train
path: mtr_as/train-*
- config_name: mtr_bn
data_files:
- split: train
path: mtr_bn/train-*
- config_name: mtr_en
data_files:
- split: train
path: mtr_en/train-*
- config_name: mtr_hi
data_files:
- split: train
path: mtr_hi/train-*
- config_name: mtr_or
data_files:
- split: train
path: mtr_or/train-*
- config_name: mtr_ta
data_files:
- split: train
path: mtr_ta/train-*
- config_name: mun_en
data_files:
- split: train
path: mun_en/train-*
- config_name: mun_hi
data_files:
- split: train
path: mun_hi/train-*
- config_name: mwr_as
data_files:
- split: train
path: mwr_as/train-*
- config_name: mwr_bn
data_files:
- split: train
path: mwr_bn/train-*
- config_name: mwr_en
data_files:
- split: train
path: mwr_en/train-*
- config_name: mwr_hi
data_files:
- split: train
path: mwr_hi/train-*
- config_name: mwr_or
data_files:
- split: train
path: mwr_or/train-*
- config_name: mwr_ta
data_files:
- split: train
path: mwr_ta/train-*
- config_name: ne_en
data_files:
- split: train
path: ne_en/train-*
- config_name: ne_hi
data_files:
- split: train
path: ne_hi/train-*
- config_name: noe_as
data_files:
- split: train
path: noe_as/train-*
- config_name: noe_bn
data_files:
- split: train
path: noe_bn/train-*
- config_name: noe_en
data_files:
- split: train
path: noe_en/train-*
- config_name: noe_hi
data_files:
- split: train
path: noe_hi/train-*
- config_name: noe_or
data_files:
- split: train
path: noe_or/train-*
- config_name: noe_ta
data_files:
- split: train
path: noe_ta/train-*
- config_name: or_en
data_files:
- split: train
path: or_en/train-*
- config_name: or_hi
data_files:
- split: train
path: or_hi/train-*
- config_name: pa_en
data_files:
- split: train
path: pa_en/train-*
- config_name: pa_hi
data_files:
- split: train
path: pa_hi/train-*
- config_name: phr_as
data_files:
- split: train
path: phr_as/train-*
- config_name: phr_bn
data_files:
- split: train
path: phr_bn/train-*
- config_name: phr_en
data_files:
- split: train
path: phr_en/train-*
- config_name: phr_hi
data_files:
- split: train
path: phr_hi/train-*
- config_name: phr_or
data_files:
- split: train
path: phr_or/train-*
- config_name: phr_ta
data_files:
- split: train
path: phr_ta/train-*
- config_name: raj_as
data_files:
- split: train
path: raj_as/train-*
- config_name: raj_bn
data_files:
- split: train
path: raj_bn/train-*
- config_name: raj_en
data_files:
- split: train
path: raj_en/train-*
- config_name: raj_en_mono
data_files:
- split: train
path: raj_en_mono/train-*
- config_name: raj_hi
data_files:
- split: train
path: raj_hi/train-*
- config_name: raj_hi_mono
data_files:
- split: train
path: raj_hi_mono/train-*
- config_name: raj_or
data_files:
- split: train
path: raj_or/train-*
- config_name: raj_ta
data_files:
- split: train
path: raj_ta/train-*
- config_name: sa_en
data_files:
- split: train
path: sa_en/train-*
- config_name: sa_hi
data_files:
- split: train
path: sa_hi/train-*
- config_name: sat_as
data_files:
- split: train
path: sat_as/train-*
- config_name: sat_bn
data_files:
- split: train
path: sat_bn/train-*
- config_name: sat_en
data_files:
- split: train
path: sat_en/train-*
- config_name: sat_hi
data_files:
- split: train
path: sat_hi/train-*
- config_name: sat_or
data_files:
- split: train
path: sat_or/train-*
- config_name: sat_ta
data_files:
- split: train
path: sat_ta/train-*
- config_name: sd_en
data_files:
- split: train
path: sd_en/train-*
- config_name: sd_hi
data_files:
- split: train
path: sd_hi/train-*
- config_name: sgj_as
data_files:
- split: train
path: sgj_as/train-*
- config_name: sgj_bn
data_files:
- split: train
path: sgj_bn/train-*
- config_name: sgj_en
data_files:
- split: train
path: sgj_en/train-*
- config_name: sgj_hi
data_files:
- split: train
path: sgj_hi/train-*
- config_name: sgj_or
data_files:
- split: train
path: sgj_or/train-*
- config_name: sgj_ta
data_files:
- split: train
path: sgj_ta/train-*
- config_name: sor_en
data_files:
- split: train
path: sor_en/train-*
- config_name: sor_hi
data_files:
- split: train
path: sor_hi/train-*
- config_name: spv_as
data_files:
- split: train
path: spv_as/train-*
- config_name: spv_bn
data_files:
- split: train
path: spv_bn/train-*
- config_name: spv_en
data_files:
- split: train
path: spv_en/train-*
- config_name: spv_hi
data_files:
- split: train
path: spv_hi/train-*
- config_name: spv_or
data_files:
- split: train
path: spv_or/train-*
- config_name: spv_ta
data_files:
- split: train
path: spv_ta/train-*
- config_name: ta_en
data_files:
- split: train
path: ta_en/train-*
- config_name: ta_hi
data_files:
- split: train
path: ta_hi/train-*
- config_name: tcy_as
data_files:
- split: train
path: tcy_as/train-*
- config_name: tcy_bn
data_files:
- split: train
path: tcy_bn/train-*
- config_name: tcy_en
data_files:
- split: train
path: tcy_en/train-*
- config_name: tcy_en_mono
data_files:
- split: train
path: tcy_en_mono/train-*
- config_name: tcy_hi
data_files:
- split: train
path: tcy_hi/train-*
- config_name: tcy_hi_mono
data_files:
- split: train
path: tcy_hi_mono/train-*
- config_name: tcy_or
data_files:
- split: train
path: tcy_or/train-*
- config_name: tcy_ta
data_files:
- split: train
path: tcy_ta/train-*
- config_name: te_en
data_files:
- split: train
path: te_en/train-*
- config_name: te_hi
data_files:
- split: train
path: te_hi/train-*
- config_name: ur_en
data_files:
- split: train
path: ur_en/train-*
- config_name: ur_hi
data_files:
- split: train
path: ur_hi/train-*
- config_name: wbr_as
data_files:
- split: train
path: wbr_as/train-*
- config_name: wbr_bn
data_files:
- split: train
path: wbr_bn/train-*
- config_name: wbr_en
data_files:
- split: train
path: wbr_en/train-*
- config_name: wbr_hi
data_files:
- split: train
path: wbr_hi/train-*
- config_name: wbr_or
data_files:
- split: train
path: wbr_or/train-*
- config_name: wbr_ta
data_files:
- split: train
path: wbr_ta/train-*
- config_name: xnr_as
data_files:
- split: train
path: xnr_as/train-*
- config_name: xnr_bn
data_files:
- split: train
path: xnr_bn/train-*
- config_name: xnr_en
data_files:
- split: train
path: xnr_en/train-*
- config_name: xnr_hi
data_files:
- split: train
path: xnr_hi/train-*
- config_name: xnr_or
data_files:
- split: train
path: xnr_or/train-*
- config_name: xnr_ta
data_files:
- split: train
path: xnr_ta/train-*
---
# Translation Dataset - Low Resource Indian Languages
Parallel translation datasets for 50 Indian languages, generated using GPT-5-mini for NLLB-200 finetuning.
## Dataset Details
- **Total configs:** 239
- **Examples per config:** ~4,000
- **Total examples:** ~956,000
- **Languages:** 50 Indian languages across Indo-Aryan, Dravidian, Austroasiatic, and Sino-Tibetan families
- **Hub languages:** English, Hindi, Bengali, Tamil, Odia, Assamese
## Usage
## Language Codes
| Code | Language | Code | Language |
|------|----------|------|----------|
| ahr | Ahirani | kfr | Kachchhi |
| as | Assamese | kfy | Kumauni |
| bfy | Bagheli | kht | Khortha |
| bfz | Mahasu Pahari | kok | Konkani |
| bgc | Haryanvi | kru | Kurukh |
| bgq | Bagri | ks_arab | Kashmiri (Arabic) |
| bhb | Bhili | ks_deva | Kashmiri (Devanagari) |
| bn | Bengali | lmn | Lambadi |
| bns | Bundeli | mai | Maithili |
| bra | Brajbhasha | ml | Malayalam |
| brj | Banjari | mni | Manipuri |
| brx | Bodo | mtr | Mewari |
| dcc | Dakini | mwr | Marwari |
| doi | Dogri | noe | Nimadi |
| en | English | or | Odia |
| gbm | Garhwali | phr | Pahari |
| gon | Gondi | raj | Rajasthani |
| grt | Garo | sat | Santali |
| gu | Gujarati | sgj | Surgujia |
| hi | Hindi | spv | Sambalpuri |
| hoj | Harauti | ta | Tamil |
| kfa | Kodava | tcy | Tulu |
| ne | Nepali | wbr | Wagdi |
| pa | Punjabi | xnr | Kangri |
| sa | Sanskrit | sd | Sindhi |
提供机构:
ayush-shunyalabs



