five

rayliuca/WikidataLabels

收藏
Hugging Face2024-01-11 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/rayliuca/WikidataLabels
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc0-1.0 dataset_info: - config_name: aa features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13986211 num_examples: 436895 download_size: 9821312 dataset_size: 13986211 - config_name: ab features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 5012532 num_examples: 159908 download_size: 3013706 dataset_size: 5012532 - config_name: abs features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4252728 num_examples: 143986 download_size: 2567450 dataset_size: 4252728 - config_name: ace features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 19105673 num_examples: 574712 download_size: 13573374 dataset_size: 19105673 - config_name: ady features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4444259 num_examples: 148627 download_size: 2705754 dataset_size: 4444259 - config_name: ady-cyrl features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4412556 num_examples: 147884 download_size: 2682170 dataset_size: 4412556 - config_name: aeb features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4305734 num_examples: 145198 download_size: 2606368 dataset_size: 4305734 - config_name: aeb-arab features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4467930 num_examples: 148796 download_size: 2722169 dataset_size: 4467930 - config_name: aeb-latn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 12770359 num_examples: 404946 download_size: 8886489 dataset_size: 12770359 - config_name: af features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 58561042 num_examples: 1643153 download_size: 42539052 dataset_size: 58561042 - config_name: agq features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 1317 num_examples: 33 download_size: 2906 dataset_size: 1317 - config_name: ak features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14198715 num_examples: 443037 download_size: 9991525 dataset_size: 14198715 - config_name: aln features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13811116 num_examples: 432089 download_size: 9673418 dataset_size: 13811116 - config_name: als features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 20691 num_examples: 543 download_size: 17540 dataset_size: 20691 - config_name: alt features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 108390 num_examples: 1814 download_size: 59046 dataset_size: 108390 - config_name: am features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 5231176 num_examples: 163038 download_size: 3187164 dataset_size: 5231176 - config_name: ami features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 21519 num_examples: 686 download_size: 16640 dataset_size: 21519 - config_name: an features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 240345072 num_examples: 5921087 download_size: 164895205 dataset_size: 240345072 - config_name: ang features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14275715 num_examples: 443461 download_size: 10063758 dataset_size: 14275715 - config_name: anp features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8558258 num_examples: 241612 download_size: 4381360 dataset_size: 8558258 - config_name: ar features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 291173732 num_examples: 5724064 download_size: 159369497 dataset_size: 291173732 - config_name: arc features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4473283 num_examples: 150006 download_size: 2722619 dataset_size: 4473283 - config_name: arn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13879729 num_examples: 433912 download_size: 9715431 dataset_size: 13879729 - config_name: arq features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4346991 num_examples: 146004 download_size: 2636972 dataset_size: 4346991 - config_name: ary features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 5358568 num_examples: 171568 download_size: 3313402 dataset_size: 5358568 - config_name: arz features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 81806333 num_examples: 1669699 download_size: 49423508 dataset_size: 81806333 - config_name: as features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 21658610 num_examples: 450074 download_size: 9641626 dataset_size: 21658610 - config_name: ase features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4252943 num_examples: 143986 download_size: 2568106 dataset_size: 4252943 - config_name: ast features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 1385628786 num_examples: 20696237 download_size: 955908362 dataset_size: 1385628786 - config_name: atj features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 12996229 num_examples: 411639 download_size: 9057557 dataset_size: 12996229 - config_name: av features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4722934 num_examples: 153781 download_size: 2880103 dataset_size: 4722934 - config_name: avk features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13194485 num_examples: 414598 download_size: 9200917 dataset_size: 13194485 - config_name: awa features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8599312 num_examples: 242320 download_size: 4411751 dataset_size: 8599312 - config_name: ay features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14269432 num_examples: 443521 download_size: 10029939 dataset_size: 14269432 - config_name: az features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 21049248 num_examples: 516732 download_size: 14117527 dataset_size: 21049248 - config_name: azb features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 30781587 num_examples: 607562 download_size: 16028687 dataset_size: 30781587 - config_name: ba features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 11525351 num_examples: 261509 download_size: 6733777 dataset_size: 11525351 - config_name: ban features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13674052 num_examples: 426706 download_size: 9513747 dataset_size: 13674052 - config_name: ban-bali features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 50961 num_examples: 748 download_size: 25817 dataset_size: 50961 - config_name: bar features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 54783034 num_examples: 1566120 download_size: 40389830 dataset_size: 54783034 - config_name: bbc features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 12820895 num_examples: 406960 download_size: 8917054 dataset_size: 12820895 - config_name: bcc features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8017228 num_examples: 241977 download_size: 4344579 dataset_size: 8017228 - config_name: be features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 30978832 num_examples: 564184 download_size: 17461174 dataset_size: 30978832 - config_name: be-tarask features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 18931909 num_examples: 374396 download_size: 10871239 dataset_size: 18931909 - config_name: bg features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 200628708 num_examples: 4383953 download_size: 137745533 dataset_size: 200628708 - config_name: bgn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 7999280 num_examples: 241566 download_size: 4331249 dataset_size: 7999280 - config_name: bi features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14040026 num_examples: 438382 download_size: 9867032 dataset_size: 14040026 - config_name: bjn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8375348 num_examples: 254558 download_size: 5722334 dataset_size: 8375348 - config_name: bm features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 18145787 num_examples: 549694 download_size: 13129193 dataset_size: 18145787 - config_name: bn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 815803977 num_examples: 9767284 download_size: 261147329 dataset_size: 815803977 - config_name: bo features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 11671330 num_examples: 278307 download_size: 5669602 dataset_size: 11671330 - config_name: bpy features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 15497749 num_examples: 347458 download_size: 6991190 dataset_size: 15497749 - config_name: bqi features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8017455 num_examples: 241984 download_size: 4345123 dataset_size: 8017455 - config_name: br features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 58304963 num_examples: 1653800 download_size: 42722031 dataset_size: 58304963 - config_name: brh features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 5328437 num_examples: 171504 download_size: 3376189 dataset_size: 5328437 - config_name: bs features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 30441466 num_examples: 858190 download_size: 21606575 dataset_size: 30441466 - config_name: btm features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4252525 num_examples: 143980 download_size: 2567218 dataset_size: 4252525 - config_name: bto features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 12841721 num_examples: 407470 download_size: 8934218 dataset_size: 12841721 - config_name: bug features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 7595464 num_examples: 235268 download_size: 5129941 dataset_size: 7595464 - config_name: bxr features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4713699 num_examples: 153707 download_size: 2869313 dataset_size: 4713699 - config_name: ca features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 408509932 num_examples: 9936886 download_size: 288474980 dataset_size: 408509932 - config_name: cbk-zam features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14108232 num_examples: 440345 download_size: 9920793 dataset_size: 14108232 - config_name: cdo features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 6503254 num_examples: 201362 download_size: 4137841 dataset_size: 6503254 - config_name: ce features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 28093148 num_examples: 607767 download_size: 16367596 dataset_size: 28093148 - config_name: ceb features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 332947091 num_examples: 7769402 download_size: 219525737 dataset_size: 332947091 - config_name: ch features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13983906 num_examples: 436785 download_size: 9817385 dataset_size: 13983906 - config_name: cho features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13950786 num_examples: 435869 download_size: 9791296 dataset_size: 13950786 - config_name: chr features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 5386793 num_examples: 172855 download_size: 3419676 dataset_size: 5386793 - config_name: chy features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13994916 num_examples: 437007 download_size: 9830465 dataset_size: 13994916 - config_name: ckb features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 23343034 num_examples: 511183 download_size: 11459344 dataset_size: 23343034 - config_name: co features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 47080480 num_examples: 1346929 download_size: 34551346 dataset_size: 47080480 - config_name: cps features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 12849864 num_examples: 407695 download_size: 8941921 dataset_size: 12849864 - config_name: cr features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 5516556 num_examples: 176667 download_size: 3532952 dataset_size: 5516556 - config_name: crh features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 10864382 num_examples: 336709 download_size: 7542853 dataset_size: 10864382 - config_name: crh-cyrl features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4419064 num_examples: 148046 download_size: 2688683 dataset_size: 4419064 - config_name: crh-latn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14201429 num_examples: 442905 download_size: 9986290 dataset_size: 14201429 - config_name: cs features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 140189244 num_examples: 3384048 download_size: 97516751 dataset_size: 140189244 - config_name: csb features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 20177120 num_examples: 619275 download_size: 14528772 dataset_size: 20177120 - config_name: cv features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8047221 num_examples: 215611 download_size: 4857718 dataset_size: 8047221 - config_name: cy features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 89241808 num_examples: 2244550 download_size: 62686006 dataset_size: 89241808 - config_name: da features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 130931077 num_examples: 3448894 download_size: 98202417 dataset_size: 130931077 - config_name: dag features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 2664957 num_examples: 78534 download_size: 2052615 dataset_size: 2664957 - config_name: de features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 765398522 num_examples: 17531361 download_size: 527642124 dataset_size: 765398522 - config_name: de-at features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 53043722 num_examples: 1515373 download_size: 38761571 dataset_size: 53043722 - config_name: de-ch features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 53480908 num_examples: 1528137 download_size: 39349412 dataset_size: 53480908 - config_name: de-formal features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4256391 num_examples: 144061 download_size: 2571862 dataset_size: 4256391 - config_name: din features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 12819746 num_examples: 406591 download_size: 8922303 dataset_size: 12819746 - config_name: diq features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 7570161 num_examples: 232674 download_size: 5057742 dataset_size: 7570161 - config_name: dsb features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 16135830 num_examples: 491423 download_size: 11412316 dataset_size: 16135830 - config_name: dtp features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13867373 num_examples: 433733 download_size: 9720699 dataset_size: 13867373 - config_name: dty features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8839082 num_examples: 246026 download_size: 4551845 dataset_size: 8839082 - config_name: dua features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 2631 num_examples: 87 download_size: 3877 dataset_size: 2631 - config_name: dv features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 81396462 num_examples: 2103276 download_size: 45332104 dataset_size: 81396462 - config_name: dz features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8590239 num_examples: 242196 download_size: 4406353 dataset_size: 8590239 - config_name: ee features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14377017 num_examples: 447208 download_size: 10136064 dataset_size: 14377017 - config_name: egl features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13068224 num_examples: 413551 download_size: 9121776 dataset_size: 13068224 - config_name: el features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 32978562 num_examples: 592016 download_size: 19577876 dataset_size: 32978562 - config_name: eml features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14768563 num_examples: 458847 download_size: 10453636 dataset_size: 14768563 - config_name: en features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 6327454281 num_examples: 81801560 download_size: 4224231068 dataset_size: 6327454281 - config_name: en-ca features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 73305274 num_examples: 1909970 download_size: 53060194 dataset_size: 73305274 - config_name: en-gb features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 115978412 num_examples: 2520405 download_size: 78924421 dataset_size: 115978412 - config_name: en-us features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14815 num_examples: 332 download_size: 9953 dataset_size: 14815 - config_name: eo features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 256196064 num_examples: 6285304 download_size: 177219679 dataset_size: 256196064 - config_name: es features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 730214298 num_examples: 17233968 download_size: 514588069 dataset_size: 730214298 - config_name: es-419 features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4355180 num_examples: 146476 download_size: 2659218 dataset_size: 4355180 - config_name: es-formal features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4280933 num_examples: 144717 download_size: 2592085 dataset_size: 4280933 - config_name: et features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 65123623 num_examples: 1820762 download_size: 48197302 dataset_size: 65123623 - config_name: eu features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 290282374 num_examples: 7109758 download_size: 197889378 dataset_size: 290282374 - config_name: ext features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 223257222 num_examples: 5359047 download_size: 147078789 dataset_size: 223257222 - config_name: fa features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 123727757 num_examples: 2142642 download_size: 65952114 dataset_size: 123727757 - config_name: ff features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14116652 num_examples: 440614 download_size: 9920388 dataset_size: 14116652 - config_name: fi features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 286539944 num_examples: 6905698 download_size: 209916638 dataset_size: 286539944 - config_name: fit features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 20217258 num_examples: 620391 download_size: 14566702 dataset_size: 20217258 - config_name: fj features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14159041 num_examples: 441745 download_size: 9956108 dataset_size: 14159041 - config_name: fkv features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4328482 num_examples: 145988 download_size: 2619845 dataset_size: 4328482 - config_name: fo features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 24474476 num_examples: 731732 download_size: 17876981 dataset_size: 24474476 - config_name: fr features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 774128723 num_examples: 17908351 download_size: 534489308 dataset_size: 774128723 - config_name: frc features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 17896106 num_examples: 547258 download_size: 12953740 dataset_size: 17896106 - config_name: frp features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 40902510 num_examples: 1191134 download_size: 29778105 dataset_size: 40902510 - config_name: frr features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 16979214 num_examples: 515350 download_size: 12069637 dataset_size: 16979214 - config_name: fur features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 42077410 num_examples: 1221071 download_size: 30714082 dataset_size: 42077410 - config_name: ga features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 471527543 num_examples: 11524282 download_size: 320967189 dataset_size: 471527543 - config_name: gag features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14149375 num_examples: 440732 download_size: 9940551 dataset_size: 14149375 - config_name: gan features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 31572161 num_examples: 905186 download_size: 18909564 dataset_size: 31572161 - config_name: gan-hans features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 31004794 num_examples: 889875 download_size: 18566811 dataset_size: 31004794 - config_name: gan-hant features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4374444 num_examples: 147098 download_size: 2657182 dataset_size: 4374444 - config_name: gcr features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4311409 num_examples: 145829 download_size: 2618211 dataset_size: 4311409 - config_name: gd features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 49316935 num_examples: 1429457 download_size: 36220978 dataset_size: 49316935 - config_name: gl features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 289484839 num_examples: 7052226 download_size: 197315151 dataset_size: 289484839 - config_name: glk features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8327018 num_examples: 249115 download_size: 4538325 dataset_size: 8327018 - config_name: gn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14212974 num_examples: 442765 download_size: 10004863 dataset_size: 14212974 - config_name: gom features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4584575 num_examples: 150273 download_size: 2780570 dataset_size: 4584575 - config_name: gom-deva features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8585678 num_examples: 242131 download_size: 4400578 dataset_size: 8585678 - config_name: gom-latn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 12783006 num_examples: 405302 download_size: 8897342 dataset_size: 12783006 - config_name: gor features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14667616 num_examples: 454512 download_size: 10319196 dataset_size: 14667616 - config_name: got features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 5432139 num_examples: 172951 download_size: 3435531 dataset_size: 5432139 - config_name: grc features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4494817 num_examples: 149631 download_size: 2746170 dataset_size: 4494817 - config_name: gu features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 23788894 num_examples: 486140 download_size: 10779200 dataset_size: 23788894 - config_name: guc features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 1419 num_examples: 38 download_size: 3054 dataset_size: 1419 - config_name: guw features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 118 num_examples: 4 download_size: 1864 dataset_size: 118 - config_name: gv features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 20683485 num_examples: 631005 download_size: 14894590 dataset_size: 20683485 - config_name: ha features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14716168 num_examples: 455836 download_size: 10421790 dataset_size: 14716168 - config_name: hak features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 6128644 num_examples: 193036 download_size: 3991729 dataset_size: 6128644 - config_name: haw features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14158084 num_examples: 441511 download_size: 9952975 dataset_size: 14158084 - config_name: he features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 43629050 num_examples: 884809 download_size: 27221301 dataset_size: 43629050 - config_name: hi features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 37237187 num_examples: 668964 download_size: 17804873 dataset_size: 37237187 - config_name: hif features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14457954 num_examples: 449009 download_size: 10166264 dataset_size: 14457954 - config_name: hif-latn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14519845 num_examples: 454037 download_size: 10240704 dataset_size: 14519845 - config_name: hil features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 12928914 num_examples: 409962 download_size: 9009705 dataset_size: 12928914 - config_name: ho features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13950504 num_examples: 435857 download_size: 9790849 dataset_size: 13950504 - config_name: hr features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 61272623 num_examples: 1720527 download_size: 45307411 dataset_size: 61272623 - config_name: hrx features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 12869295 num_examples: 407823 download_size: 8964114 dataset_size: 12869295 - config_name: hsb features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 23720349 num_examples: 707100 download_size: 17145693 dataset_size: 23720349 - config_name: ht features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 16835529 num_examples: 509955 download_size: 11880404 dataset_size: 16835529 - config_name: hu features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 85054175 num_examples: 2200589 download_size: 64143342 dataset_size: 85054175 - config_name: hu-formal features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4252810 num_examples: 143986 download_size: 2567582 dataset_size: 4252810 - config_name: hy features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 39339286 num_examples: 773925 download_size: 22108994 dataset_size: 39339286 - config_name: hyw features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 5443608 num_examples: 166902 download_size: 3238370 dataset_size: 5443608 - config_name: hz features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13948574 num_examples: 435804 download_size: 9788697 dataset_size: 13948574 - config_name: ia features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 229143237 num_examples: 5616433 download_size: 155877454 dataset_size: 229143237 - config_name: id features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 95220928 num_examples: 2512331 download_size: 69525046 dataset_size: 95220928 - config_name: ie features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 225725262 num_examples: 5533032 download_size: 153371930 dataset_size: 225725262 - config_name: ig features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 20109388 num_examples: 617044 download_size: 14475407 dataset_size: 20109388 - config_name: ii features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4310418 num_examples: 145332 download_size: 2609723 dataset_size: 4310418 - config_name: ik features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13989609 num_examples: 436958 download_size: 9823174 dataset_size: 13989609 - config_name: ike-cans features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4352278 num_examples: 146355 download_size: 2645174 dataset_size: 4352278 - config_name: ike-latn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13851135 num_examples: 432932 download_size: 9714057 dataset_size: 13851135 - config_name: ilo features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 15955483 num_examples: 480555 download_size: 11141942 dataset_size: 15955483 - config_name: inh features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4634360 num_examples: 152226 download_size: 2831580 dataset_size: 4634360 - config_name: io features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 233656822 num_examples: 5757440 download_size: 159720058 dataset_size: 233656822 - config_name: is features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 51679396 num_examples: 1483610 download_size: 37965494 dataset_size: 51679396 - config_name: it features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 536601426 num_examples: 12631487 download_size: 375025347 dataset_size: 536601426 - config_name: iu features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 5360588 num_examples: 172215 download_size: 3402239 dataset_size: 5360588 - config_name: ja features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 140641579 num_examples: 2917962 download_size: 92145329 dataset_size: 140641579 - config_name: jam features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 18849751 num_examples: 571777 download_size: 13684422 dataset_size: 18849751 - config_name: jbo features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14301985 num_examples: 446512 download_size: 9994516 dataset_size: 14301985 - config_name: jv features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 27232302 num_examples: 794181 download_size: 19651565 dataset_size: 27232302 - config_name: ka features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 24073345 num_examples: 399546 download_size: 11679979 dataset_size: 24073345 - config_name: kaa features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14082184 num_examples: 439411 download_size: 9902820 dataset_size: 14082184 - config_name: kab features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 18459676 num_examples: 557857 download_size: 13384218 dataset_size: 18459676 - config_name: kbd features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4594409 num_examples: 149733 download_size: 2759503 dataset_size: 4594409 - config_name: kbd-cyrl features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4417661 num_examples: 148017 download_size: 2687531 dataset_size: 4417661 - config_name: kbp features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 12873178 num_examples: 408039 download_size: 8965474 dataset_size: 12873178 - config_name: kea features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 12793700 num_examples: 405901 download_size: 8896866 dataset_size: 12793700 - config_name: kg features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 40949149 num_examples: 1193499 download_size: 29766747 dataset_size: 40949149 - config_name: khw features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4308653 num_examples: 145279 download_size: 2608581 dataset_size: 4308653 - config_name: ki features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14056900 num_examples: 439015 download_size: 9875534 dataset_size: 14056900 - config_name: kj features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13881723 num_examples: 433861 download_size: 9733715 dataset_size: 13881723 - config_name: kjp features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8504302 num_examples: 240339 download_size: 4341523 dataset_size: 8504302 - config_name: kk features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 19216115 num_examples: 428880 download_size: 11577682 dataset_size: 19216115 - config_name: kk-arab features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 7241749 num_examples: 211731 download_size: 4487032 dataset_size: 7241749 - config_name: kk-kz features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4937945 num_examples: 160027 download_size: 3062906 dataset_size: 4937945 - config_name: kk-latn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 22197825 num_examples: 677162 download_size: 16072332 dataset_size: 22197825 - config_name: kk-tr features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 20060635 num_examples: 616521 download_size: 14438929 dataset_size: 20060635 - config_name: ko features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 60335212 num_examples: 1364440 download_size: 39186630 dataset_size: 60335212 - config_name: ko-kp features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4338717 num_examples: 146150 download_size: 2630925 dataset_size: 4338717 - config_name: koi features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4737590 num_examples: 155082 download_size: 2894674 dataset_size: 4737590 - config_name: kr features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13886057 num_examples: 433990 download_size: 9737602 dataset_size: 13886057 - config_name: krc features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4646136 num_examples: 151026 download_size: 2785454 dataset_size: 4646136 - config_name: kri features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 12798530 num_examples: 406032 download_size: 8902330 dataset_size: 12798530 - config_name: krj features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13850324 num_examples: 433444 download_size: 9703460 dataset_size: 13850324 - config_name: krl features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 12788020 num_examples: 405729 download_size: 8893337 dataset_size: 12788020 - config_name: ks features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4390604 num_examples: 147033 download_size: 2671069 dataset_size: 4390604 - config_name: ks-deva features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8567518 num_examples: 241832 download_size: 4387687 dataset_size: 8567518 - config_name: ksh features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 20394712 num_examples: 624523 download_size: 14698860 dataset_size: 20394712 - config_name: ku features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8037777 num_examples: 239515 download_size: 5306097 dataset_size: 8037777 - config_name: ku-arab features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4577826 num_examples: 151290 download_size: 2796159 dataset_size: 4577826 - config_name: ku-latn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14683841 num_examples: 458802 download_size: 10371977 dataset_size: 14683841 - config_name: kum features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4252739 num_examples: 143985 download_size: 2567503 dataset_size: 4252739 - config_name: kv features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4946978 num_examples: 158888 download_size: 2997865 dataset_size: 4946978 - config_name: kw features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 20245535 num_examples: 621432 download_size: 14581378 dataset_size: 20245535 - config_name: ky features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8909613 num_examples: 235165 download_size: 5462115 dataset_size: 8909613 - config_name: la features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 299766395 num_examples: 7085082 download_size: 201477460 dataset_size: 299766395 - config_name: lad features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 20336417 num_examples: 622775 download_size: 14653199 dataset_size: 20336417 - config_name: lb features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 56473066 num_examples: 1601093 download_size: 41410732 dataset_size: 56473066 - config_name: lbe features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4501470 num_examples: 149898 download_size: 2744786 dataset_size: 4501470 - config_name: lez features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4890798 num_examples: 155936 download_size: 2959653 dataset_size: 4890798 - config_name: lfn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14709210 num_examples: 456719 download_size: 10408539 dataset_size: 14709210 - config_name: lg features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13979286 num_examples: 436009 download_size: 9802779 dataset_size: 13979286 - config_name: li features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 43476868 num_examples: 1253970 download_size: 31750932 dataset_size: 43476868 - config_name: lij features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 42327066 num_examples: 1227346 download_size: 30898971 dataset_size: 42327066 - config_name: liv features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 12781331 num_examples: 405236 download_size: 8895889 dataset_size: 12781331 - config_name: lki features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8039166 num_examples: 242526 download_size: 4363703 dataset_size: 8039166 - config_name: lld features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 90305 num_examples: 2634 download_size: 69672 dataset_size: 90305 - config_name: lmo features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 18287638 num_examples: 545398 download_size: 13130119 dataset_size: 18287638 - config_name: ln features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14123637 num_examples: 439731 download_size: 9915851 dataset_size: 14123637 - config_name: lo features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 9905189 num_examples: 271710 download_size: 5313218 dataset_size: 9905189 - config_name: loz features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13695602 num_examples: 428723 download_size: 9581113 dataset_size: 13695602 - config_name: lt features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 39902419 num_examples: 1096727 download_size: 29185765 dataset_size: 39902419 - config_name: ltg features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13884707 num_examples: 433453 download_size: 9736637 dataset_size: 13884707 - config_name: lus features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13695197 num_examples: 428712 download_size: 9580538 dataset_size: 13695197 - config_name: luz features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8459036 num_examples: 253454 download_size: 4687414 dataset_size: 8459036 - config_name: lv features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 27242119 num_examples: 764753 download_size: 19676667 dataset_size: 27242119 - config_name: lzh features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 25067538 num_examples: 685152 download_size: 14998856 dataset_size: 25067538 - config_name: mdf features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4634268 num_examples: 152141 download_size: 2820744 dataset_size: 4634268 - config_name: mg features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 43863002 num_examples: 1271074 download_size: 32016826 dataset_size: 43863002 - config_name: mh features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13775721 num_examples: 431162 download_size: 9644397 dataset_size: 13775721 - config_name: mi features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 20857040 num_examples: 637118 download_size: 15060301 dataset_size: 20857040 - config_name: min features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 53044258 num_examples: 1464128 download_size: 38587450 dataset_size: 53044258 - config_name: mk features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 24087229 num_examples: 449241 download_size: 12217912 dataset_size: 24087229 - config_name: ml features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 189266798 num_examples: 2664923 download_size: 71344031 dataset_size: 189266798 - config_name: mn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 9311543 num_examples: 219695 download_size: 5272784 dataset_size: 9311543 - config_name: mni features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8696893 num_examples: 243616 download_size: 4470994 dataset_size: 8696893 - config_name: mnw features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8861861 num_examples: 244906 download_size: 4517726 dataset_size: 8861861 - config_name: mo features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 5377009 num_examples: 172144 download_size: 3405661 dataset_size: 5377009 - config_name: mr features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 26855182 num_examples: 526220 download_size: 12358679 dataset_size: 26855182 - config_name: mrh features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 68 num_examples: 2 download_size: 1820 dataset_size: 68 - config_name: mrj features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 5007903 num_examples: 160889 download_size: 3073431 dataset_size: 5007903 - config_name: ms features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 64674328 num_examples: 1803714 download_size: 47165217 dataset_size: 64674328 - config_name: ms-arab features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 136496 num_examples: 2961 download_size: 92316 dataset_size: 136496 - config_name: mt features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 22632686 num_examples: 682867 download_size: 16352572 dataset_size: 22632686 - config_name: mus features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14013416 num_examples: 437688 download_size: 9835239 dataset_size: 14013416 - config_name: mwl features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14493299 num_examples: 448926 download_size: 10225888 dataset_size: 14493299 - config_name: my features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 16182182 num_examples: 345096 download_size: 7981905 dataset_size: 16182182 - config_name: mzn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 17973941 num_examples: 447870 download_size: 9174617 dataset_size: 17973941 - config_name: na features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13992666 num_examples: 436956 download_size: 9823328 dataset_size: 13992666 - config_name: nah features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14490294 num_examples: 449748 download_size: 10192501 dataset_size: 14490294 - config_name: nan-hani features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 191 num_examples: 6 download_size: 1925 dataset_size: 191 - config_name: nap features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 42362346 num_examples: 1229161 download_size: 30918265 dataset_size: 42362346 - config_name: nb features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 142554768 num_examples: 3688026 download_size: 105549981 dataset_size: 142554768 - config_name: nds features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 58766114 num_examples: 1666813 download_size: 43421948 dataset_size: 58766114 - config_name: nds-nl features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 44121756 num_examples: 1273149 download_size: 32201410 dataset_size: 44121756 - config_name: ne features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 11925386 num_examples: 295006 download_size: 6265232 dataset_size: 11925386 - config_name: new features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 16906308 num_examples: 350362 download_size: 7680329 dataset_size: 16906308 - config_name: ng features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13870754 num_examples: 433582 download_size: 9723795 dataset_size: 13870754 - config_name: nia features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 20649 num_examples: 515 download_size: 16535 dataset_size: 20649 - config_name: niu features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 12794247 num_examples: 405902 download_size: 8897260 dataset_size: 12794247 - config_name: nl features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 5016576732 num_examples: 61931959 download_size: 3380404239 dataset_size: 5016576732 - config_name: nn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 99997815 num_examples: 2708994 download_size: 74736304 dataset_size: 99997815 - config_name: 'no' features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 2934 num_examples: 64 download_size: 4108 dataset_size: 2934 - config_name: nod features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4322068 num_examples: 145566 download_size: 2618106 dataset_size: 4322068 - config_name: nov features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14150434 num_examples: 440903 download_size: 9947798 dataset_size: 14150434 - config_name: nqo features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8094271 num_examples: 243184 download_size: 4398836 dataset_size: 8094271 - config_name: nrm features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 41330956 num_examples: 1203295 download_size: 30084065 dataset_size: 41330956 - config_name: nso features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14178321 num_examples: 443205 download_size: 9959708 dataset_size: 14178321 - config_name: nv features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 15351770 num_examples: 455188 download_size: 10472240 dataset_size: 15351770 - config_name: ny features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13989813 num_examples: 436764 download_size: 9821588 dataset_size: 13989813 - config_name: nys features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13092059 num_examples: 413241 download_size: 9153100 dataset_size: 13092059 - config_name: oc features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 266612548 num_examples: 6569770 download_size: 180156462 dataset_size: 266612548 - config_name: olo features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13200388 num_examples: 416935 download_size: 9214968 dataset_size: 13200388 - config_name: om features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 5476389 num_examples: 175314 download_size: 3496637 dataset_size: 5476389 - config_name: or features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 22798709 num_examples: 470237 download_size: 10322832 dataset_size: 22798709 - config_name: os features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 5946062 num_examples: 177054 download_size: 3583703 dataset_size: 5946062 - config_name: ota features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8015024 num_examples: 241903 download_size: 4343478 dataset_size: 8015024 - config_name: pa features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 20505754 num_examples: 481522 download_size: 10552147 dataset_size: 20505754 - config_name: pam features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14527964 num_examples: 451253 download_size: 10242443 dataset_size: 14527964 - config_name: pap features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 54505401 num_examples: 1449881 download_size: 40415776 dataset_size: 54505401 - config_name: pcd features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 42132826 num_examples: 1221362 download_size: 30766812 dataset_size: 42132826 - config_name: pdc features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14435256 num_examples: 448055 download_size: 10178322 dataset_size: 14435256 - config_name: pdt features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13994892 num_examples: 437200 download_size: 9819388 dataset_size: 13994892 - config_name: pfl features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 15461023 num_examples: 474198 download_size: 10893651 dataset_size: 15461023 - config_name: pi features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8913354 num_examples: 250251 download_size: 4651392 dataset_size: 8913354 - config_name: pih features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13971081 num_examples: 436214 download_size: 9810653 dataset_size: 13971081 - config_name: pl features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 426030491 num_examples: 10025139 download_size: 295767506 dataset_size: 426030491 - config_name: pms features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 51268512 num_examples: 1477043 download_size: 37698831 dataset_size: 51268512 - config_name: pnb features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 16192682 num_examples: 409037 download_size: 9196626 dataset_size: 16192682 - config_name: pnt features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4439173 num_examples: 148336 download_size: 2703117 dataset_size: 4439173 - config_name: prg features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 17940420 num_examples: 544030 download_size: 12958482 dataset_size: 17940420 - config_name: ps features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8860902 num_examples: 259186 download_size: 4916502 dataset_size: 8860902 - config_name: pt features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 491184040 num_examples: 11574568 download_size: 340831923 dataset_size: 491184040 - config_name: pt-br features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 318857431 num_examples: 7782980 download_size: 223442911 dataset_size: 318857431 - config_name: pwn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8500 num_examples: 269 download_size: 8738 dataset_size: 8500 - config_name: qu features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 15254702 num_examples: 468823 download_size: 10750388 dataset_size: 15254702 - config_name: quc features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 32 num_examples: 1 download_size: 1772 dataset_size: 32 - config_name: qug features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13798264 num_examples: 431733 download_size: 9661685 dataset_size: 13798264 - config_name: rgn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 17001688 num_examples: 519871 download_size: 12258201 dataset_size: 17001688 - config_name: rif features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13792951 num_examples: 431588 download_size: 9657698 dataset_size: 13792951 - config_name: rm features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 44450577 num_examples: 1284908 download_size: 32519630 dataset_size: 44450577 - config_name: rmc features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 159 num_examples: 4 download_size: 1963 dataset_size: 159 - config_name: rmy features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 5610156 num_examples: 179191 download_size: 3608283 dataset_size: 5610156 - config_name: rn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13935534 num_examples: 435271 download_size: 9779486 dataset_size: 13935534 - config_name: ro features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 247469452 num_examples: 5878366 download_size: 177525205 dataset_size: 247469452 - config_name: roa-tara features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14425120 num_examples: 448972 download_size: 10152875 dataset_size: 14425120 - config_name: ru features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 405103215 num_examples: 7485811 download_size: 257215625 dataset_size: 405103215 - config_name: rue features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4953403 num_examples: 159530 download_size: 3037824 dataset_size: 4953403 - config_name: rup features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14459686 num_examples: 450345 download_size: 10198398 dataset_size: 14459686 - config_name: ruq-cyrl features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4434290 num_examples: 148404 download_size: 2700920 dataset_size: 4434290 - config_name: ruq-latn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13783683 num_examples: 430978 download_size: 9656941 dataset_size: 13783683 - config_name: rw features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14090196 num_examples: 439172 download_size: 9901257 dataset_size: 14090196 - config_name: rwr features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8568706 num_examples: 241841 download_size: 4388475 dataset_size: 8568706 - config_name: ryu features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 2852 num_examples: 82 download_size: 4237 dataset_size: 2852 - config_name: sa features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 21404327 num_examples: 455674 download_size: 9692464 dataset_size: 21404327 - config_name: sat features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 10810040 num_examples: 284911 download_size: 5750917 dataset_size: 10810040 - config_name: sc features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 47195572 num_examples: 1348137 download_size: 34521764 dataset_size: 47195572 - config_name: scn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 43458983 num_examples: 1259067 download_size: 31775157 dataset_size: 43458983 - config_name: sco features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 56960413 num_examples: 1611092 download_size: 41724559 dataset_size: 56960413 - config_name: sd features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14257513 num_examples: 363318 download_size: 7844047 dataset_size: 14257513 - config_name: sdc features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13975497 num_examples: 436913 download_size: 9800517 dataset_size: 13975497 - config_name: se features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 23962268 num_examples: 711439 download_size: 17409387 dataset_size: 23962268 - config_name: sei features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13827581 num_examples: 432520 download_size: 9684192 dataset_size: 13827581 - config_name: sg features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13913524 num_examples: 434751 download_size: 9761739 dataset_size: 13913524 - config_name: sh features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 30173635 num_examples: 746207 download_size: 20133594 dataset_size: 30173635 - config_name: shi-latn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13783218 num_examples: 430968 download_size: 9656828 dataset_size: 13783218 - config_name: shi-tfng features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4308577 num_examples: 145279 download_size: 2608525 dataset_size: 4308577 - config_name: shn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 10139002 num_examples: 260808 download_size: 4952168 dataset_size: 10139002 - config_name: shy-latn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4255322 num_examples: 144058 download_size: 2570625 dataset_size: 4255322 - config_name: si features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 7405400 num_examples: 189718 download_size: 4270591 dataset_size: 7405400 - config_name: sjd features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4300688 num_examples: 145047 download_size: 2604357 dataset_size: 4300688 - config_name: sje features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 20970223 num_examples: 637639 download_size: 15120381 dataset_size: 20970223 - config_name: sju features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4315103 num_examples: 145655 download_size: 2620763 dataset_size: 4315103 - config_name: sk features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 75586366 num_examples: 2050873 download_size: 54951330 dataset_size: 75586366 - config_name: skr features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4274062 num_examples: 144443 download_size: 2585286 dataset_size: 4274062 - config_name: sl features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 157883240 num_examples: 4112048 download_size: 118047353 dataset_size: 157883240 - config_name: sli features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13909208 num_examples: 434986 download_size: 9745964 dataset_size: 13909208 - config_name: sm features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13984823 num_examples: 436830 download_size: 9817472 dataset_size: 13984823 - config_name: sma features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 20653595 num_examples: 630437 download_size: 14902319 dataset_size: 20653595 - config_name: smj features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 19640206 num_examples: 604326 download_size: 14133964 dataset_size: 19640206 - config_name: smn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 10902411 num_examples: 337543 download_size: 7576850 dataset_size: 10902411 - config_name: sms features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4462345 num_examples: 149355 download_size: 2741038 dataset_size: 4462345 - config_name: sn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 20116601 num_examples: 618231 download_size: 14463728 dataset_size: 20116601 - config_name: sq features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 304708913 num_examples: 7311820 download_size: 225592169 dataset_size: 304708913 - config_name: sr features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 52787253 num_examples: 1018361 download_size: 31364006 dataset_size: 52787253 - config_name: sr-ec features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 9237541 num_examples: 248556 download_size: 5875548 dataset_size: 9237541 - config_name: sr-el features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 48848162 num_examples: 1418824 download_size: 35859120 dataset_size: 48848162 - config_name: srq features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 12796525 num_examples: 405957 download_size: 8899493 dataset_size: 12796525 - config_name: ss features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13823630 num_examples: 432423 download_size: 9682165 dataset_size: 13823630 - config_name: st features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13938937 num_examples: 435419 download_size: 9785161 dataset_size: 13938937 - config_name: stq features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14484394 num_examples: 449885 download_size: 10228446 dataset_size: 14484394 - config_name: su features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 20025826 num_examples: 583096 download_size: 14042822 dataset_size: 20025826 - config_name: sv features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 339074900 num_examples: 8115455 download_size: 236022796 dataset_size: 339074900 - config_name: sw features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 50612064 num_examples: 1465385 download_size: 37096369 dataset_size: 50612064 - config_name: szl features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 16772062 num_examples: 500107 download_size: 11868254 dataset_size: 16772062 - config_name: szy features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4332021 num_examples: 146136 download_size: 2633271 dataset_size: 4332021 - config_name: ta features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 31251824 num_examples: 546558 download_size: 15157673 dataset_size: 31251824 - config_name: tay features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4345269 num_examples: 146938 download_size: 2632535 dataset_size: 4345269 - config_name: tcy features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 8723594 num_examples: 244350 download_size: 4487471 dataset_size: 8723594 - config_name: te features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 27587665 num_examples: 569615 download_size: 13669398 dataset_size: 27587665 - config_name: tet features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 15092299 num_examples: 466244 download_size: 10702917 dataset_size: 15092299 - config_name: tg features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 12643125 num_examples: 304625 download_size: 7622522 dataset_size: 12643125 - config_name: tg-cyrl features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4504034 num_examples: 149533 download_size: 2755000 dataset_size: 4504034 - config_name: tg-latn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 19845835 num_examples: 610020 download_size: 14264492 dataset_size: 19845835 - config_name: th features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 32693750 num_examples: 537447 download_size: 15849247 dataset_size: 32693750 - config_name: ti features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4366995 num_examples: 146479 download_size: 2648869 dataset_size: 4366995 - config_name: tk features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 5797050 num_examples: 184302 download_size: 3728802 dataset_size: 5797050 - config_name: tl features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13661554 num_examples: 387377 download_size: 9456413 dataset_size: 13661554 - config_name: tly features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4309748 num_examples: 145312 download_size: 2609307 dataset_size: 4309748 - config_name: tly-cyrl features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 35 num_examples: 1 download_size: 1793 dataset_size: 35 - config_name: tn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13936132 num_examples: 435219 download_size: 9780279 dataset_size: 13936132 - config_name: to features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13980327 num_examples: 436460 download_size: 9810650 dataset_size: 13980327 - config_name: tpi features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14169019 num_examples: 442133 download_size: 9961827 dataset_size: 14169019 - config_name: tr features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 72134544 num_examples: 1770267 download_size: 51032484 dataset_size: 72134544 - config_name: tru features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 5322844 num_examples: 171327 download_size: 3371105 dataset_size: 5322844 - config_name: trv features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 94285 num_examples: 3109 download_size: 65138 dataset_size: 94285 - config_name: ts features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13943481 num_examples: 435408 download_size: 9783789 dataset_size: 13943481 - config_name: tt features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 24182976 num_examples: 548502 download_size: 14868166 dataset_size: 24182976 - config_name: tt-cyrl features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4943914 num_examples: 158198 download_size: 3048932 dataset_size: 4943914 - config_name: tt-latn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13842972 num_examples: 432513 download_size: 9702714 dataset_size: 13842972 - config_name: tum features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13924159 num_examples: 435110 download_size: 9770501 dataset_size: 13924159 - config_name: tw features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13830508 num_examples: 432669 download_size: 9688164 dataset_size: 13830508 - config_name: ty features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 16816401 num_examples: 507332 download_size: 12098154 dataset_size: 16816401 - config_name: tyv features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4583082 num_examples: 149929 download_size: 2779632 dataset_size: 4583082 - config_name: tzm features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4253588 num_examples: 144002 download_size: 2569067 dataset_size: 4253588 - config_name: udm features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4854947 num_examples: 156300 download_size: 2958444 dataset_size: 4854947 - config_name: ug-arab features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4316690 num_examples: 145443 download_size: 2614962 dataset_size: 4316690 - config_name: ug-latn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13786474 num_examples: 431056 download_size: 9659723 dataset_size: 13786474 - config_name: uk features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 251058352 num_examples: 5108733 download_size: 168140976 dataset_size: 251058352 - config_name: ur features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 57063750 num_examples: 987011 download_size: 28328459 dataset_size: 57063750 - config_name: uz features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 11731793 num_examples: 344615 download_size: 8102734 dataset_size: 11731793 - config_name: uz-cyrl features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4252574 num_examples: 143981 download_size: 2567325 dataset_size: 4252574 - config_name: ve features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 13932174 num_examples: 435216 download_size: 9777266 dataset_size: 13932174 - config_name: vec features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 52081230 num_examples: 1466867 download_size: 37307805 dataset_size: 52081230 - config_name: vep features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 6174898 num_examples: 192298 download_size: 3994582 dataset_size: 6174898 - config_name: vi features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 246835524 num_examples: 5743737 download_size: 172949263 dataset_size: 246835524 - config_name: vls features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 42789297 num_examples: 1239359 download_size: 31228294 dataset_size: 42789297 - config_name: vmf features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 18352990 num_examples: 555205 download_size: 13289296 dataset_size: 18352990 - config_name: vo features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 228352533 num_examples: 5610875 download_size: 155496988 dataset_size: 228352533 - config_name: vot features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 5406190 num_examples: 173486 download_size: 3439433 dataset_size: 5406190 - config_name: wa features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 49235347 num_examples: 1426584 download_size: 36167816 dataset_size: 49235347 - config_name: war features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 190306474 num_examples: 4449062 download_size: 133786270 dataset_size: 190306474 - config_name: wls features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4033 num_examples: 104 download_size: 5150 dataset_size: 4033 - config_name: wo features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 40961626 num_examples: 1193626 download_size: 29778666 dataset_size: 40961626 - config_name: wuu features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 40570130 num_examples: 1127741 download_size: 24209117 dataset_size: 40570130 - config_name: wya features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 28 num_examples: 1 download_size: 1740 dataset_size: 28 - config_name: xal features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4475344 num_examples: 149984 download_size: 2722459 dataset_size: 4475344 - config_name: xh features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 20036194 num_examples: 615514 download_size: 14405310 dataset_size: 20036194 - config_name: xmf features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 5943645 num_examples: 169507 download_size: 3418593 dataset_size: 5943645 - config_name: xsy features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4262789 num_examples: 144305 download_size: 2573349 dataset_size: 4262789 - config_name: yav features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4070 num_examples: 102 download_size: 4718 dataset_size: 4070 - config_name: yi features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 5495313 num_examples: 170277 download_size: 3373820 dataset_size: 5495313 - config_name: yo features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 25424749 num_examples: 724345 download_size: 18086773 dataset_size: 25424749 - config_name: za features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 15159230 num_examples: 365892 download_size: 7774767 dataset_size: 15159230 - config_name: zea features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 14538518 num_examples: 451577 download_size: 10262897 dataset_size: 14538518 - config_name: zgh features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 4253917 num_examples: 144006 download_size: 2569373 dataset_size: 4253917 - config_name: zh features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 264353677 num_examples: 5424320 download_size: 174420118 dataset_size: 264353677 - config_name: zh-cn features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 42868611 num_examples: 1158755 download_size: 27243799 dataset_size: 42868611 - config_name: zh-hans features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 57233156 num_examples: 1483225 download_size: 36583522 dataset_size: 57233156 - config_name: zh-hant features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 53502814 num_examples: 1356560 download_size: 36755083 dataset_size: 53502814 - config_name: zh-hk features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 15325323 num_examples: 408391 download_size: 10455809 dataset_size: 15325323 - config_name: zh-mo features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 6568267 num_examples: 180950 download_size: 3547260 dataset_size: 6568267 - config_name: zh-my features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 32637498 num_examples: 916876 download_size: 19289581 dataset_size: 32637498 - config_name: zh-sg features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 35325327 num_examples: 979652 download_size: 21150070 dataset_size: 35325327 - config_name: zh-tw features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 17500668 num_examples: 443057 download_size: 11121104 dataset_size: 17500668 - config_name: zh-yue features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 1352 num_examples: 30 download_size: 2963 dataset_size: 1352 - config_name: zu features: - name: wikidata_id dtype: string - name: lastrevid dtype: int64 - name: label dtype: string splits: - name: label num_bytes: 47349379 num_examples: 1380550 download_size: 34649660 dataset_size: 47349379 configs: - config_name: aa data_files: - split: label path: aa/label-* - config_name: ab data_files: - split: label path: ab/label-* - config_name: abs data_files: - split: label path: abs/label-* - config_name: ace data_files: - split: label path: ace/label-* - config_name: ady data_files: - split: label path: ady/label-* - config_name: ady-cyrl data_files: - split: label path: ady-cyrl/label-* - config_name: aeb data_files: - split: label path: aeb/label-* - config_name: aeb-arab data_files: - split: label path: aeb-arab/label-* - config_name: aeb-latn data_files: - split: label path: aeb-latn/label-* - config_name: af data_files: - split: label path: af/label-* - config_name: agq data_files: - split: label path: agq/label-* - config_name: ak data_files: - split: label path: ak/label-* - config_name: aln data_files: - split: label path: aln/label-* - config_name: als data_files: - split: label path: als/label-* - config_name: alt data_files: - split: label path: alt/label-* - config_name: am data_files: - split: label path: am/label-* - config_name: ami data_files: - split: label path: ami/label-* - config_name: an data_files: - split: label path: an/label-* - config_name: ang data_files: - split: label path: ang/label-* - config_name: anp data_files: - split: label path: anp/label-* - config_name: ar data_files: - split: label path: ar/label-* - config_name: arc data_files: - split: label path: arc/label-* - config_name: arn data_files: - split: label path: arn/label-* - config_name: arq data_files: - split: label path: arq/label-* - config_name: ary data_files: - split: label path: ary/label-* - config_name: arz data_files: - split: label path: arz/label-* - config_name: as data_files: - split: label path: as/label-* - config_name: ase data_files: - split: label path: ase/label-* - config_name: ast data_files: - split: label path: ast/label-* - config_name: atj data_files: - split: label path: atj/label-* - config_name: av data_files: - split: label path: av/label-* - config_name: avk data_files: - split: label path: avk/label-* - config_name: awa data_files: - split: label path: awa/label-* - config_name: ay data_files: - split: label path: ay/label-* - config_name: az data_files: - split: label path: az/label-* - config_name: azb data_files: - split: label path: azb/label-* - config_name: ba data_files: - split: label path: ba/label-* - config_name: ban data_files: - split: label path: ban/label-* - config_name: ban-bali data_files: - split: label path: ban-bali/label-* - config_name: bar data_files: - split: label path: bar/label-* - config_name: bbc data_files: - split: label path: bbc/label-* - config_name: bcc data_files: - split: label path: bcc/label-* - config_name: be data_files: - split: label path: be/label-* - config_name: be-tarask data_files: - split: label path: be-tarask/label-* - config_name: bg data_files: - split: label path: bg/label-* - config_name: bgn data_files: - split: label path: bgn/label-* - config_name: bi data_files: - split: label path: bi/label-* - config_name: bjn data_files: - split: label path: bjn/label-* - config_name: bm data_files: - split: label path: bm/label-* - config_name: bn data_files: - split: label path: bn/label-* - config_name: bo data_files: - split: label path: bo/label-* - config_name: bpy data_files: - split: label path: bpy/label-* - config_name: bqi data_files: - split: label path: bqi/label-* - config_name: br data_files: - split: label path: br/label-* - config_name: brh data_files: - split: label path: brh/label-* - config_name: bs data_files: - split: label path: bs/label-* - config_name: btm data_files: - split: label path: btm/label-* - config_name: bto data_files: - split: label path: bto/label-* - config_name: bug data_files: - split: label path: bug/label-* - config_name: bxr data_files: - split: label path: bxr/label-* - config_name: ca data_files: - split: label path: ca/label-* - config_name: cbk-zam data_files: - split: label path: cbk-zam/label-* - config_name: cdo data_files: - split: label path: cdo/label-* - config_name: ce data_files: - split: label path: ce/label-* - config_name: ceb data_files: - split: label path: ceb/label-* - config_name: ch data_files: - split: label path: ch/label-* - config_name: cho data_files: - split: label path: cho/label-* - config_name: chr data_files: - split: label path: chr/label-* - config_name: chy data_files: - split: label path: chy/label-* - config_name: ckb data_files: - split: label path: ckb/label-* - config_name: co data_files: - split: label path: co/label-* - config_name: cps data_files: - split: label path: cps/label-* - config_name: cr data_files: - split: label path: cr/label-* - config_name: crh data_files: - split: label path: crh/label-* - config_name: crh-cyrl data_files: - split: label path: crh-cyrl/label-* - config_name: crh-latn data_files: - split: label path: crh-latn/label-* - config_name: cs data_files: - split: label path: cs/label-* - config_name: csb data_files: - split: label path: csb/label-* - config_name: cv data_files: - split: label path: cv/label-* - config_name: cy data_files: - split: label path: cy/label-* - config_name: da data_files: - split: label path: da/label-* - config_name: dag data_files: - split: label path: dag/label-* - config_name: de data_files: - split: label path: de/label-* - config_name: de-at data_files: - split: label path: de-at/label-* - config_name: de-ch data_files: - split: label path: de-ch/label-* - config_name: de-formal data_files: - split: label path: de-formal/label-* - config_name: din data_files: - split: label path: din/label-* - config_name: diq data_files: - split: label path: diq/label-* - config_name: dsb data_files: - split: label path: dsb/label-* - config_name: dtp data_files: - split: label path: dtp/label-* - config_name: dty data_files: - split: label path: dty/label-* - config_name: dua data_files: - split: label path: dua/label-* - config_name: dv data_files: - split: label path: dv/label-* - config_name: dz data_files: - split: label path: dz/label-* - config_name: ee data_files: - split: label path: ee/label-* - config_name: egl data_files: - split: label path: egl/label-* - config_name: el data_files: - split: label path: el/label-* - config_name: eml data_files: - split: label path: eml/label-* - config_name: en data_files: - split: label path: en/label-* default: true - config_name: en-ca data_files: - split: label path: en-ca/label-* - config_name: en-gb data_files: - split: label path: en-gb/label-* - config_name: en-us data_files: - split: label path: en-us/label-* - config_name: eo data_files: - split: label path: eo/label-* - config_name: es data_files: - split: label path: es/label-* - config_name: es-419 data_files: - split: label path: es-419/label-* - config_name: es-formal data_files: - split: label path: es-formal/label-* - config_name: et data_files: - split: label path: et/label-* - config_name: eu data_files: - split: label path: eu/label-* - config_name: ext data_files: - split: label path: ext/label-* - config_name: fa data_files: - split: label path: fa/label-* - config_name: ff data_files: - split: label path: ff/label-* - config_name: fi data_files: - split: label path: fi/label-* - config_name: fit data_files: - split: label path: fit/label-* - config_name: fj data_files: - split: label path: fj/label-* - config_name: fkv data_files: - split: label path: fkv/label-* - config_name: fo data_files: - split: label path: fo/label-* - config_name: fr data_files: - split: label path: fr/label-* - config_name: frc data_files: - split: label path: frc/label-* - config_name: frp data_files: - split: label path: frp/label-* - config_name: frr data_files: - split: label path: frr/label-* - config_name: fur data_files: - split: label path: fur/label-* - config_name: ga data_files: - split: label path: ga/label-* - config_name: gag data_files: - split: label path: gag/label-* - config_name: gan data_files: - split: label path: gan/label-* - config_name: gan-hans data_files: - split: label path: gan-hans/label-* - config_name: gan-hant data_files: - split: label path: gan-hant/label-* - config_name: gcr data_files: - split: label path: gcr/label-* - config_name: gd data_files: - split: label path: gd/label-* - config_name: gl data_files: - split: label path: gl/label-* - config_name: glk data_files: - split: label path: glk/label-* - config_name: gn data_files: - split: label path: gn/label-* - config_name: gom data_files: - split: label path: gom/label-* - config_name: gom-deva data_files: - split: label path: gom-deva/label-* - config_name: gom-latn data_files: - split: label path: gom-latn/label-* - config_name: gor data_files: - split: label path: gor/label-* - config_name: got data_files: - split: label path: got/label-* - config_name: grc data_files: - split: label path: grc/label-* - config_name: gu data_files: - split: label path: gu/label-* - config_name: guc data_files: - split: label path: guc/label-* - config_name: guw data_files: - split: label path: guw/label-* - config_name: gv data_files: - split: label path: gv/label-* - config_name: ha data_files: - split: label path: ha/label-* - config_name: hak data_files: - split: label path: hak/label-* - config_name: haw data_files: - split: label path: haw/label-* - config_name: he data_files: - split: label path: he/label-* - config_name: hi data_files: - split: label path: hi/label-* - config_name: hif data_files: - split: label path: hif/label-* - config_name: hif-latn data_files: - split: label path: hif-latn/label-* - config_name: hil data_files: - split: label path: hil/label-* - config_name: ho data_files: - split: label path: ho/label-* - config_name: hr data_files: - split: label path: hr/label-* - config_name: hrx data_files: - split: label path: hrx/label-* - config_name: hsb data_files: - split: label path: hsb/label-* - config_name: ht data_files: - split: label path: ht/label-* - config_name: hu data_files: - split: label path: hu/label-* - config_name: hu-formal data_files: - split: label path: hu-formal/label-* - config_name: hy data_files: - split: label path: hy/label-* - config_name: hyw data_files: - split: label path: hyw/label-* - config_name: hz data_files: - split: label path: hz/label-* - config_name: ia data_files: - split: label path: ia/label-* - config_name: id data_files: - split: label path: id/label-* - config_name: ie data_files: - split: label path: ie/label-* - config_name: ig data_files: - split: label path: ig/label-* - config_name: ii data_files: - split: label path: ii/label-* - config_name: ik data_files: - split: label path: ik/label-* - config_name: ike-cans data_files: - split: label path: ike-cans/label-* - config_name: ike-latn data_files: - split: label path: ike-latn/label-* - config_name: ilo data_files: - split: label path: ilo/label-* - config_name: inh data_files: - split: label path: inh/label-* - config_name: io data_files: - split: label path: io/label-* - config_name: is data_files: - split: label path: is/label-* - config_name: it data_files: - split: label path: it/label-* - config_name: iu data_files: - split: label path: iu/label-* - config_name: ja data_files: - split: label path: ja/label-* - config_name: jam data_files: - split: label path: jam/label-* - config_name: jbo data_files: - split: label path: jbo/label-* - config_name: jv data_files: - split: label path: jv/label-* - config_name: ka data_files: - split: label path: ka/label-* - config_name: kaa data_files: - split: label path: kaa/label-* - config_name: kab data_files: - split: label path: kab/label-* - config_name: kbd data_files: - split: label path: kbd/label-* - config_name: kbd-cyrl data_files: - split: label path: kbd-cyrl/label-* - config_name: kbp data_files: - split: label path: kbp/label-* - config_name: kea data_files: - split: label path: kea/label-* - config_name: kg data_files: - split: label path: kg/label-* - config_name: khw data_files: - split: label path: khw/label-* - config_name: ki data_files: - split: label path: ki/label-* - config_name: kj data_files: - split: label path: kj/label-* - config_name: kjp data_files: - split: label path: kjp/label-* - config_name: kk data_files: - split: label path: kk/label-* - config_name: kk-arab data_files: - split: label path: kk-arab/label-* - config_name: kk-kz data_files: - split: label path: kk-kz/label-* - config_name: kk-latn data_files: - split: label path: kk-latn/label-* - config_name: kk-tr data_files: - split: label path: kk-tr/label-* - config_name: ko data_files: - split: label path: ko/label-* - config_name: ko-kp data_files: - split: label path: ko-kp/label-* - config_name: koi data_files: - split: label path: koi/label-* - config_name: kr data_files: - split: label path: kr/label-* - config_name: krc data_files: - split: label path: krc/label-* - config_name: kri data_files: - split: label path: kri/label-* - config_name: krj data_files: - split: label path: krj/label-* - config_name: krl data_files: - split: label path: krl/label-* - config_name: ks data_files: - split: label path: ks/label-* - config_name: ks-deva data_files: - split: label path: ks-deva/label-* - config_name: ksh data_files: - split: label path: ksh/label-* - config_name: ku data_files: - split: label path: ku/label-* - config_name: ku-arab data_files: - split: label path: ku-arab/label-* - config_name: ku-latn data_files: - split: label path: ku-latn/label-* - config_name: kum data_files: - split: label path: kum/label-* - config_name: kv data_files: - split: label path: kv/label-* - config_name: kw data_files: - split: label path: kw/label-* - config_name: ky data_files: - split: label path: ky/label-* - config_name: la data_files: - split: label path: la/label-* - config_name: lad data_files: - split: label path: lad/label-* - config_name: lb data_files: - split: label path: lb/label-* - config_name: lbe data_files: - split: label path: lbe/label-* - config_name: lez data_files: - split: label path: lez/label-* - config_name: lfn data_files: - split: label path: lfn/label-* - config_name: lg data_files: - split: label path: lg/label-* - config_name: li data_files: - split: label path: li/label-* - config_name: lij data_files: - split: label path: lij/label-* - config_name: liv data_files: - split: label path: liv/label-* - config_name: lki data_files: - split: label path: lki/label-* - config_name: lld data_files: - split: label path: lld/label-* - config_name: lmo data_files: - split: label path: lmo/label-* - config_name: ln data_files: - split: label path: ln/label-* - config_name: lo data_files: - split: label path: lo/label-* - config_name: loz data_files: - split: label path: loz/label-* - config_name: lt data_files: - split: label path: lt/label-* - config_name: ltg data_files: - split: label path: ltg/label-* - config_name: lus data_files: - split: label path: lus/label-* - config_name: luz data_files: - split: label path: luz/label-* - config_name: lv data_files: - split: label path: lv/label-* - config_name: lzh data_files: - split: label path: lzh/label-* - config_name: mdf data_files: - split: label path: mdf/label-* - config_name: mg data_files: - split: label path: mg/label-* - config_name: mh data_files: - split: label path: mh/label-* - config_name: mi data_files: - split: label path: mi/label-* - config_name: min data_files: - split: label path: min/label-* - config_name: mk data_files: - split: label path: mk/label-* - config_name: ml data_files: - split: label path: ml/label-* - config_name: mn data_files: - split: label path: mn/label-* - config_name: mni data_files: - split: label path: mni/label-* - config_name: mnw data_files: - split: label path: mnw/label-* - config_name: mo data_files: - split: label path: mo/label-* - config_name: mr data_files: - split: label path: mr/label-* - config_name: mrh data_files: - split: label path: mrh/label-* - config_name: mrj data_files: - split: label path: mrj/label-* - config_name: ms data_files: - split: label path: ms/label-* - config_name: ms-arab data_files: - split: label path: ms-arab/label-* - config_name: mt data_files: - split: label path: mt/label-* - config_name: mus data_files: - split: label path: mus/label-* - config_name: mwl data_files: - split: label path: mwl/label-* - config_name: my data_files: - split: label path: my/label-* - config_name: mzn data_files: - split: label path: mzn/label-* - config_name: na data_files: - split: label path: na/label-* - config_name: nah data_files: - split: label path: nah/label-* - config_name: nan-hani data_files: - split: label path: nan-hani/label-* - config_name: nap data_files: - split: label path: nap/label-* - config_name: nb data_files: - split: label path: nb/label-* - config_name: nds data_files: - split: label path: nds/label-* - config_name: nds-nl data_files: - split: label path: nds-nl/label-* - config_name: ne data_files: - split: label path: ne/label-* - config_name: new data_files: - split: label path: new/label-* - config_name: ng data_files: - split: label path: ng/label-* - config_name: nia data_files: - split: label path: nia/label-* - config_name: niu data_files: - split: label path: niu/label-* - config_name: nl data_files: - split: label path: nl/label-* - config_name: nn data_files: - split: label path: nn/label-* - config_name: 'no' data_files: - split: label path: no/label-* - config_name: nod data_files: - split: label path: nod/label-* - config_name: nov data_files: - split: label path: nov/label-* - config_name: nqo data_files: - split: label path: nqo/label-* - config_name: nrm data_files: - split: label path: nrm/label-* - config_name: nso data_files: - split: label path: nso/label-* - config_name: nv data_files: - split: label path: nv/label-* - config_name: ny data_files: - split: label path: ny/label-* - config_name: nys data_files: - split: label path: nys/label-* - config_name: oc data_files: - split: label path: oc/label-* - config_name: olo data_files: - split: label path: olo/label-* - config_name: om data_files: - split: label path: om/label-* - config_name: or data_files: - split: label path: or/label-* - config_name: os data_files: - split: label path: os/label-* - config_name: ota data_files: - split: label path: ota/label-* - config_name: pa data_files: - split: label path: pa/label-* - config_name: pam data_files: - split: label path: pam/label-* - config_name: pap data_files: - split: label path: pap/label-* - config_name: pcd data_files: - split: label path: pcd/label-* - config_name: pdc data_files: - split: label path: pdc/label-* - config_name: pdt data_files: - split: label path: pdt/label-* - config_name: pfl data_files: - split: label path: pfl/label-* - config_name: pi data_files: - split: label path: pi/label-* - config_name: pih data_files: - split: label path: pih/label-* - config_name: pl data_files: - split: label path: pl/label-* - config_name: pms data_files: - split: label path: pms/label-* - config_name: pnb data_files: - split: label path: pnb/label-* - config_name: pnt data_files: - split: label path: pnt/label-* - config_name: prg data_files: - split: label path: prg/label-* - config_name: ps data_files: - split: label path: ps/label-* - config_name: pt data_files: - split: label path: pt/label-* - config_name: pt-br data_files: - split: label path: pt-br/label-* - config_name: pwn data_files: - split: label path: pwn/label-* - config_name: qu data_files: - split: label path: qu/label-* - config_name: quc data_files: - split: label path: quc/label-* - config_name: qug data_files: - split: label path: qug/label-* - config_name: rgn data_files: - split: label path: rgn/label-* - config_name: rif data_files: - split: label path: rif/label-* - config_name: rm data_files: - split: label path: rm/label-* - config_name: rmc data_files: - split: label path: rmc/label-* - config_name: rmy data_files: - split: label path: rmy/label-* - config_name: rn data_files: - split: label path: rn/label-* - config_name: ro data_files: - split: label path: ro/label-* - config_name: roa-tara data_files: - split: label path: roa-tara/label-* - config_name: ru data_files: - split: label path: ru/label-* - config_name: rue data_files: - split: label path: rue/label-* - config_name: rup data_files: - split: label path: rup/label-* - config_name: ruq-cyrl data_files: - split: label path: ruq-cyrl/label-* - config_name: ruq-latn data_files: - split: label path: ruq-latn/label-* - config_name: rw data_files: - split: label path: rw/label-* - config_name: rwr data_files: - split: label path: rwr/label-* - config_name: ryu data_files: - split: label path: ryu/label-* - config_name: sa data_files: - split: label path: sa/label-* - config_name: sat data_files: - split: label path: sat/label-* - config_name: sc data_files: - split: label path: sc/label-* - config_name: scn data_files: - split: label path: scn/label-* - config_name: sco data_files: - split: label path: sco/label-* - config_name: sd data_files: - split: label path: sd/label-* - config_name: sdc data_files: - split: label path: sdc/label-* - config_name: se data_files: - split: label path: se/label-* - config_name: sei data_files: - split: label path: sei/label-* - config_name: sg data_files: - split: label path: sg/label-* - config_name: sh data_files: - split: label path: sh/label-* - config_name: shi-latn data_files: - split: label path: shi-latn/label-* - config_name: shi-tfng data_files: - split: label path: shi-tfng/label-* - config_name: shn data_files: - split: label path: shn/label-* - config_name: shy-latn data_files: - split: label path: shy-latn/label-* - config_name: si data_files: - split: label path: si/label-* - config_name: sjd data_files: - split: label path: sjd/label-* - config_name: sje data_files: - split: label path: sje/label-* - config_name: sju data_files: - split: label path: sju/label-* - config_name: sk data_files: - split: label path: sk/label-* - config_name: skr data_files: - split: label path: skr/label-* - config_name: sl data_files: - split: label path: sl/label-* - config_name: sli data_files: - split: label path: sli/label-* - config_name: sm data_files: - split: label path: sm/label-* - config_name: sma data_files: - split: label path: sma/label-* - config_name: smj data_files: - split: label path: smj/label-* - config_name: smn data_files: - split: label path: smn/label-* - config_name: sms data_files: - split: label path: sms/label-* - config_name: sn data_files: - split: label path: sn/label-* - config_name: sq data_files: - split: label path: sq/label-* - config_name: sr data_files: - split: label path: sr/label-* - config_name: sr-ec data_files: - split: label path: sr-ec/label-* - config_name: sr-el data_files: - split: label path: sr-el/label-* - config_name: srq data_files: - split: label path: srq/label-* - config_name: ss data_files: - split: label path: ss/label-* - config_name: st data_files: - split: label path: st/label-* - config_name: stq data_files: - split: label path: stq/label-* - config_name: su data_files: - split: label path: su/label-* - config_name: sv data_files: - split: label path: sv/label-* - config_name: sw data_files: - split: label path: sw/label-* - config_name: szl data_files: - split: label path: szl/label-* - config_name: szy data_files: - split: label path: szy/label-* - config_name: ta data_files: - split: label path: ta/label-* - config_name: tay data_files: - split: label path: tay/label-* - config_name: tcy data_files: - split: label path: tcy/label-* - config_name: te data_files: - split: label path: te/label-* - config_name: tet data_files: - split: label path: tet/label-* - config_name: tg data_files: - split: label path: tg/label-* - config_name: tg-cyrl data_files: - split: label path: tg-cyrl/label-* - config_name: tg-latn data_files: - split: label path: tg-latn/label-* - config_name: th data_files: - split: label path: th/label-* - config_name: ti data_files: - split: label path: ti/label-* - config_name: tk data_files: - split: label path: tk/label-* - config_name: tl data_files: - split: label path: tl/label-* - config_name: tly data_files: - split: label path: tly/label-* - config_name: tly-cyrl data_files: - split: label path: tly-cyrl/label-* - config_name: tn data_files: - split: label path: tn/label-* - config_name: to data_files: - split: label path: to/label-* - config_name: tpi data_files: - split: label path: tpi/label-* - config_name: tr data_files: - split: label path: tr/label-* - config_name: tru data_files: - split: label path: tru/label-* - config_name: trv data_files: - split: label path: trv/label-* - config_name: ts data_files: - split: label path: ts/label-* - config_name: tt data_files: - split: label path: tt/label-* - config_name: tt-cyrl data_files: - split: label path: tt-cyrl/label-* - config_name: tt-latn data_files: - split: label path: tt-latn/label-* - config_name: tum data_files: - split: label path: tum/label-* - config_name: tw data_files: - split: label path: tw/label-* - config_name: ty data_files: - split: label path: ty/label-* - config_name: tyv data_files: - split: label path: tyv/label-* - config_name: tzm data_files: - split: label path: tzm/label-* - config_name: udm data_files: - split: label path: udm/label-* - config_name: ug-arab data_files: - split: label path: ug-arab/label-* - config_name: ug-latn data_files: - split: label path: ug-latn/label-* - config_name: uk data_files: - split: label path: uk/label-* - config_name: ur data_files: - split: label path: ur/label-* - config_name: uz data_files: - split: label path: uz/label-* - config_name: uz-cyrl data_files: - split: label path: uz-cyrl/label-* - config_name: ve data_files: - split: label path: ve/label-* - config_name: vec data_files: - split: label path: vec/label-* - config_name: vep data_files: - split: label path: vep/label-* - config_name: vi data_files: - split: label path: vi/label-* - config_name: vls data_files: - split: label path: vls/label-* - config_name: vmf data_files: - split: label path: vmf/label-* - config_name: vo data_files: - split: label path: vo/label-* - config_name: vot data_files: - split: label path: vot/label-* - config_name: wa data_files: - split: label path: wa/label-* - config_name: war data_files: - split: label path: war/label-* - config_name: wls data_files: - split: label path: wls/label-* - config_name: wo data_files: - split: label path: wo/label-* - config_name: wuu data_files: - split: label path: wuu/label-* - config_name: wya data_files: - split: label path: wya/label-* - config_name: xal data_files: - split: label path: xal/label-* - config_name: xh data_files: - split: label path: xh/label-* - config_name: xmf data_files: - split: label path: xmf/label-* - config_name: xsy data_files: - split: label path: xsy/label-* - config_name: yav data_files: - split: label path: yav/label-* - config_name: yi data_files: - split: label path: yi/label-* - config_name: yo data_files: - split: label path: yo/label-* - config_name: za data_files: - split: label path: za/label-* - config_name: zea data_files: - split: label path: zea/label-* - config_name: zgh data_files: - split: label path: zgh/label-* - config_name: zh data_files: - split: label path: zh/label-* - config_name: zh-cn data_files: - split: label path: zh-cn/label-* - config_name: zh-hans data_files: - split: label path: zh-hans/label-* - config_name: zh-hant data_files: - split: label path: zh-hant/label-* - config_name: zh-hk data_files: - split: label path: zh-hk/label-* - config_name: zh-mo data_files: - split: label path: zh-mo/label-* - config_name: zh-my data_files: - split: label path: zh-my/label-* - config_name: zh-sg data_files: - split: label path: zh-sg/label-* - config_name: zh-tw data_files: - split: label path: zh-tw/label-* - config_name: zh-yue data_files: - split: label path: zh-yue/label-* - config_name: zu data_files: - split: label path: zu/label-* task_categories: - translation - text2text-generation language: - en - fr - de - ja - zh - hi - ar - bn - ru - es --- # Wikidata Labels Large parallel corpus for machine translation - Entity label data extracted from Wikidata (2022-01-03), filtered for item entities only - Only download the languages you need with `datasets>=2.14.0` - Similar dataset: https://huggingface.co/datasets/wmt/wikititles (18 Wikipedia titles pairs instead of all Wikidata entities) ## Dataset Details ### Dataset Sources - Wikidata JSON dump (wikidata-20220103-all.json.gz) https://www.wikidata.org/wiki/Wikidata:Database_download ## Uses You can generate parallel text examples from this dataset like below: ```python from datasets import load_dataset import pandas as pd def parallel_labels(lang_codes: list, how="inner", repo_id="rayliuca/wikidata_entity_label", merge_config={}, datasets_config={}) -> pd.DataFrame: out_df = None for lc in lang_codes: dataset = load_dataset(repo_id, lc, **datasets_config) dataset_df = dataset['label'].to_pandas().rename(columns={"label":lc}).drop(columns=['lastrevid']) if out_df is None: out_df = dataset_df else: out_df = out_df.merge( dataset_df, on='wikidata_id', how=how, **merge_config ) return out_df # Note: the "en" subset is >4GB parallel_labels(['en', 'fr', 'ja', 'zh']).head() ``` ### Output | | wikidata_id | en | fr | ja | zh | |---:|:--------------|:------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------|:---------------------------------------|:---------------------------------------------| | 0 | Q109739412 | SARS-CoV-2 Omicron variant | variant Omicron du SARS-CoV-2 | SARSコロナウイルス2-オミクロン株 | 嚴重急性呼吸道症候群冠狀病毒2型Omicron變異株 | | 1 | Q108460606 | Ulughbegsaurus | Ulughbegsaurus | ウルグベグサウルス | 兀魯伯龍屬 | | 2 | Q108556886 | AUKUS | AUKUS | AUKUS | AUKUS | | 3 | Q106496152 | Claude Joseph | Claude Joseph | クロード・ジョゼフ | 克洛德·约瑟夫 | | 4 | Q105519361 | The World's Finest Assassin Gets Reincarnated in a Different World as an Aristocrat | The World's Finest Assassin Gets Reincarnated in Another World as an Aristocrat | 世界最高の暗殺者、異世界貴族に転生する | 世界頂尖的暗殺者轉生為異世界貴族 | Note: this example table above shows a quirk(?) of the Wiki data. The French Wikipedia page [The World's Finest Assassin Gets Reincarnated in Another World as an Aristocrat](https://fr.wikipedia.org/wiki/The_World%27s_Finest_Assassin_Gets_Reincarnated_in_Another_World_as_an_Aristocrat) uses English for its title. While this could be disadvantageous for direct translation training, it also provides insights into how native speakers might call this entity instead of the literal translation on the Wiki page as well ## Dataset Structure Each language has its own subset (aka config), which means you only have to download the languages you need with `datasets>=2.14.0` Each subset has these fields: - wikidata_id - lastrevid - label ## Dataset Creation #### Data Collection and Processing - Filtered for item entities only - Ignored the descriptions as those texts are not very parallel ## Bias, Risks, and Limitations - Might be slightly outdated (2022) - Popular languages have more entries - Labels are not guaranteed to be literal translations (see examples above)
提供机构:
rayliuca
原始信息汇总

数据集概述

该数据集包含多个配置,每个配置对应不同的语言或方言。每个配置包含以下特征和分割信息:

特征

  • wikidata_id: 字符串类型
  • lastrevid: 64位整数类型
  • label: 字符串类型

分割信息

  • label: 包含字节数和示例数

配置详情

配置 aa

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 13986211
    • 示例数: 436895
  • 下载大小: 9821312
  • 数据集大小: 13986211

配置 ab

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 5012532
    • 示例数: 159908
  • 下载大小: 3013706
  • 数据集大小: 5012532

配置 abs

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 4252728
    • 示例数: 143986
  • 下载大小: 2567450
  • 数据集大小: 4252728

配置 ace

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 19105673
    • 示例数: 574712
  • 下载大小: 13573374
  • 数据集大小: 19105673

配置 ady

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 4444259
    • 示例数: 148627
  • 下载大小: 2705754
  • 数据集大小: 4444259

配置 ady-cyrl

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 4412556
    • 示例数: 147884
  • 下载大小: 2682170
  • 数据集大小: 4412556

配置 aeb

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 4305734
    • 示例数: 145198
  • 下载大小: 2606368
  • 数据集大小: 4305734

配置 aeb-arab

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 4467930
    • 示例数: 148796
  • 下载大小: 2722169
  • 数据集大小: 4467930

配置 aeb-latn

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 12770359
    • 示例数: 404946
  • 下载大小: 8886489
  • 数据集大小: 12770359

配置 af

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 58561042
    • 示例数: 1643153
  • 下载大小: 42539052
  • 数据集大小: 58561042

配置 agq

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 1317
    • 示例数: 33
  • 下载大小: 2906
  • 数据集大小: 1317

配置 ak

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 14198715
    • 示例数: 443037
  • 下载大小: 9991525
  • 数据集大小: 14198715

配置 aln

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 13811116
    • 示例数: 432089
  • 下载大小: 9673418
  • 数据集大小: 13811116

配置 als

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 20691
    • 示例数: 543
  • 下载大小: 17540
  • 数据集大小: 20691

配置 alt

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 108390
    • 示例数: 1814
  • 下载大小: 59046
  • 数据集大小: 108390

配置 am

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 5231176
    • 示例数: 163038
  • 下载大小: 3187164
  • 数据集大小: 5231176

配置 ami

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 21519
    • 示例数: 686
  • 下载大小: 16640
  • 数据集大小: 21519

配置 an

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 240345072
    • 示例数: 5921087
  • 下载大小: 164895205
  • 数据集大小: 240345072

配置 ang

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 14275715
    • 示例数: 443461
  • 下载大小: 10063758
  • 数据集大小: 14275715

配置 anp

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 8558258
    • 示例数: 241612
  • 下载大小: 4381360
  • 数据集大小: 8558258

配置 ar

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 291173732
    • 示例数: 5724064
  • 下载大小: 159369497
  • 数据集大小: 291173732

配置 arc

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 4473283
    • 示例数: 150006
  • 下载大小: 2722619
  • 数据集大小: 4473283

配置 arn

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 13879729
    • 示例数: 433912
  • 下载大小: 9715431
  • 数据集大小: 13879729

配置 arq

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 4346991
    • 示例数: 146004
  • 下载大小: 2636972
  • 数据集大小: 4346991

配置 ary

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 5358568
    • 示例数: 171568
  • 下载大小: 3313402
  • 数据集大小: 5358568

配置 arz

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 81806333
    • 示例数: 1669699
  • 下载大小: 49423508
  • 数据集大小: 81806333

配置 as

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 21658610
    • 示例数: 450074
  • 下载大小: 9641626
  • 数据集大小: 21658610

配置 ase

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 4252943
    • 示例数: 143986
  • 下载大小: 2568106
  • 数据集大小: 4252943

配置 ast

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 1385628786
    • 示例数: 20696237
  • 下载大小: 955908362
  • 数据集大小: 1385628786

配置 atj

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 12996229
    • 示例数: 411639
  • 下载大小: 9057557
  • 数据集大小: 12996229

配置 av

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 4722934
    • 示例数: 153781
  • 下载大小: 2880103
  • 数据集大小: 4722934

配置 avk

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 13194485
    • 示例数: 414598
  • 下载大小: 9200917
  • 数据集大小: 13194485

配置 awa

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 8599312
    • 示例数: 242320
  • 下载大小: 4411751
  • 数据集大小: 8599312

配置 ay

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 14269432
    • 示例数: 443521
  • 下载大小: 10029939
  • 数据集大小: 14269432

配置 az

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 21049248
    • 示例数: 516732
  • 下载大小: 14117527
  • 数据集大小: 21049248

配置 azb

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 30781587
    • 示例数: 607562
  • 下载大小: 16028687
  • 数据集大小: 30781587

配置 ba

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 11525351
    • 示例数: 261509
  • 下载大小: 6733777
  • 数据集大小: 11525351

配置 ban

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 13674052
    • 示例数: 426706
  • 下载大小: 9513747
  • 数据集大小: 13674052

配置 ban-bali

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 50961
    • 示例数: 748
  • 下载大小: 25817
  • 数据集大小: 50961

配置 bar

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 54783034
    • 示例数: 1566120
  • 下载大小: 40389830
  • 数据集大小: 54783034

配置 bbc

  • 特征: wikidata_id, lastrevid, label
  • 分割: label
    • 字节数: 12820895
    • 示例数: 406960
  • 下载大小: 8917054
  • 数据集大小: 12820895

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作