five

nltk-data-hub/crubadan

收藏
Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/nltk-data-hub/crubadan
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: table data_files: - split: crubadan path: data/table/crubadan.parquet - config_name: ab data_files: - split: crubadan path: data/ab/crubadan.parquet - config_name: abn data_files: - split: crubadan path: data/abn/crubadan.parquet - config_name: ace data_files: - split: crubadan path: data/ace/crubadan.parquet - config_name: ach data_files: - split: crubadan path: data/ach/crubadan.parquet - config_name: acu data_files: - split: crubadan path: data/acu/crubadan.parquet - config_name: ada data_files: - split: crubadan path: data/ada/crubadan.parquet - config_name: af data_files: - split: crubadan path: data/af/crubadan.parquet - config_name: agr data_files: - split: crubadan path: data/agr/crubadan.parquet - config_name: aja data_files: - split: crubadan path: data/aja/crubadan.parquet - config_name: ak data_files: - split: crubadan path: data/ak/crubadan.parquet - config_name: ako data_files: - split: crubadan path: data/ako/crubadan.parquet - config_name: alt data_files: - split: crubadan path: data/alt/crubadan.parquet - config_name: amc data_files: - split: crubadan path: data/amc/crubadan.parquet - config_name: ame data_files: - split: crubadan path: data/ame/crubadan.parquet - config_name: am data_files: - split: crubadan path: data/am/crubadan.parquet - config_name: ami data_files: - split: crubadan path: data/ami/crubadan.parquet - config_name: amr data_files: - split: crubadan path: data/amr/crubadan.parquet - config_name: an data_files: - split: crubadan path: data/an/crubadan.parquet - config_name: ang data_files: - split: crubadan path: data/ang/crubadan.parquet - config_name: ar data_files: - split: crubadan path: data/ar/crubadan.parquet - config_name: arl data_files: - split: crubadan path: data/arl/crubadan.parquet - config_name: arn data_files: - split: crubadan path: data/arn/crubadan.parquet - config_name: as data_files: - split: crubadan path: data/as/crubadan.parquet - config_name: ast data_files: - split: crubadan path: data/ast/crubadan.parquet - config_name: av data_files: - split: crubadan path: data/av/crubadan.parquet - config_name: ay data_files: - split: crubadan path: data/ay/crubadan.parquet - config_name: az data_files: - split: crubadan path: data/az/crubadan.parquet - config_name: ba data_files: - split: crubadan path: data/ba/crubadan.parquet - config_name: bal data_files: - split: crubadan path: data/bal/crubadan.parquet - config_name: ban data_files: - split: crubadan path: data/ban/crubadan.parquet - config_name: bar data_files: - split: crubadan path: data/bar/crubadan.parquet - config_name: bas data_files: - split: crubadan path: data/bas/crubadan.parquet - config_name: bba data_files: - split: crubadan path: data/bba/crubadan.parquet - config_name: bci data_files: - split: crubadan path: data/bci/crubadan.parquet - config_name: be data_files: - split: crubadan path: data/be/crubadan.parquet - config_name: bem data_files: - split: crubadan path: data/bem/crubadan.parquet - config_name: bfa data_files: - split: crubadan path: data/bfa/crubadan.parquet - config_name: bg data_files: - split: crubadan path: data/bg/crubadan.parquet - config_name: bh data_files: - split: crubadan path: data/bh/crubadan.parquet - config_name: bi data_files: - split: crubadan path: data/bi/crubadan.parquet - config_name: bik data_files: - split: crubadan path: data/bik/crubadan.parquet - config_name: bin data_files: - split: crubadan path: data/bin/crubadan.parquet - config_name: bm data_files: - split: crubadan path: data/bm/crubadan.parquet - config_name: bn data_files: - split: crubadan path: data/bn/crubadan.parquet - config_name: boa data_files: - split: crubadan path: data/boa/crubadan.parquet - config_name: bo data_files: - split: crubadan path: data/bo/crubadan.parquet - config_name: br data_files: - split: crubadan path: data/br/crubadan.parquet - config_name: bs data_files: - split: crubadan path: data/bs/crubadan.parquet - config_name: btb data_files: - split: crubadan path: data/btb/crubadan.parquet - config_name: bua data_files: - split: crubadan path: data/bua/crubadan.parquet - config_name: buc data_files: - split: crubadan path: data/buc/crubadan.parquet - config_name: bug data_files: - split: crubadan path: data/bug/crubadan.parquet - config_name: bum data_files: - split: crubadan path: data/bum/crubadan.parquet - config_name: byv data_files: - split: crubadan path: data/byv/crubadan.parquet - config_name: cab data_files: - split: crubadan path: data/cab/crubadan.parquet - config_name: ca data_files: - split: crubadan path: data/ca/crubadan.parquet - config_name: cak data_files: - split: crubadan path: data/cak/crubadan.parquet - config_name: cbr data_files: - split: crubadan path: data/cbr/crubadan.parquet - config_name: cbs data_files: - split: crubadan path: data/cbs/crubadan.parquet - config_name: cbt data_files: - split: crubadan path: data/cbt/crubadan.parquet - config_name: cbu data_files: - split: crubadan path: data/cbu/crubadan.parquet - config_name: ceb data_files: - split: crubadan path: data/ceb/crubadan.parquet - config_name: ch data_files: - split: crubadan path: data/ch/crubadan.parquet - config_name: chj data_files: - split: crubadan path: data/chj/crubadan.parquet - config_name: chk data_files: - split: crubadan path: data/chk/crubadan.parquet - config_name: chw data_files: - split: crubadan path: data/chw/crubadan.parquet - config_name: cic data_files: - split: crubadan path: data/cic/crubadan.parquet - config_name: cjk data_files: - split: crubadan path: data/cjk/crubadan.parquet - config_name: cnh data_files: - split: crubadan path: data/cnh/crubadan.parquet - config_name: cni data_files: - split: crubadan path: data/cni/crubadan.parquet - config_name: co data_files: - split: crubadan path: data/co/crubadan.parquet - config_name: cop data_files: - split: crubadan path: data/cop/crubadan.parquet - config_name: cot data_files: - split: crubadan path: data/cot/crubadan.parquet - config_name: cpu data_files: - split: crubadan path: data/cpu/crubadan.parquet - config_name: cr data_files: - split: crubadan path: data/cr/crubadan.parquet - config_name: crs data_files: - split: crubadan path: data/crs/crubadan.parquet - config_name: csa data_files: - split: crubadan path: data/csa/crubadan.parquet - config_name: csb data_files: - split: crubadan path: data/csb/crubadan.parquet - config_name: cs data_files: - split: crubadan path: data/cs/crubadan.parquet - config_name: cu data_files: - split: crubadan path: data/cu/crubadan.parquet - config_name: cuk data_files: - split: crubadan path: data/cuk/crubadan.parquet - config_name: cv data_files: - split: crubadan path: data/cv/crubadan.parquet - config_name: cy data_files: - split: crubadan path: data/cy/crubadan.parquet - config_name: czt data_files: - split: crubadan path: data/czt/crubadan.parquet - config_name: da data_files: - split: crubadan path: data/da/crubadan.parquet - config_name: dag data_files: - split: crubadan path: data/dag/crubadan.parquet - config_name: dar data_files: - split: crubadan path: data/dar/crubadan.parquet - config_name: ddn data_files: - split: crubadan path: data/ddn/crubadan.parquet - config_name: de data_files: - split: crubadan path: data/de/crubadan.parquet - config_name: dga data_files: - split: crubadan path: data/dga/crubadan.parquet - config_name: dhv data_files: - split: crubadan path: data/dhv/crubadan.parquet - config_name: diq data_files: - split: crubadan path: data/diq/crubadan.parquet - config_name: dsb data_files: - split: crubadan path: data/dsb/crubadan.parquet - config_name: dua data_files: - split: crubadan path: data/dua/crubadan.parquet - config_name: dyo data_files: - split: crubadan path: data/dyo/crubadan.parquet - config_name: dyu data_files: - split: crubadan path: data/dyu/crubadan.parquet - config_name: dz data_files: - split: crubadan path: data/dz/crubadan.parquet - config_name: ee data_files: - split: crubadan path: data/ee/crubadan.parquet - config_name: efi data_files: - split: crubadan path: data/efi/crubadan.parquet - config_name: el data_files: - split: crubadan path: data/el/crubadan.parquet - config_name: emk data_files: - split: crubadan path: data/emk/crubadan.parquet - config_name: eml data_files: - split: crubadan path: data/eml/crubadan.parquet - config_name: en data_files: - split: crubadan path: data/en/crubadan.parquet - config_name: enz data_files: - split: crubadan path: data/enz/crubadan.parquet - config_name: eo data_files: - split: crubadan path: data/eo/crubadan.parquet - config_name: es data_files: - split: crubadan path: data/es/crubadan.parquet - config_name: et data_files: - split: crubadan path: data/et/crubadan.parquet - config_name: eu data_files: - split: crubadan path: data/eu/crubadan.parquet - config_name: fa data_files: - split: crubadan path: data/fa/crubadan.parquet - config_name: ff data_files: - split: crubadan path: data/ff/crubadan.parquet - config_name: fi data_files: - split: crubadan path: data/fi/crubadan.parquet - config_name: fj data_files: - split: crubadan path: data/fj/crubadan.parquet - config_name: fo data_files: - split: crubadan path: data/fo/crubadan.parquet - config_name: fon data_files: - split: crubadan path: data/fon/crubadan.parquet - config_name: fr data_files: - split: crubadan path: data/fr/crubadan.parquet - config_name: frf data_files: - split: crubadan path: data/frf/crubadan.parquet - config_name: frp data_files: - split: crubadan path: data/frp/crubadan.parquet - config_name: frr data_files: - split: crubadan path: data/frr/crubadan.parquet - config_name: fud data_files: - split: crubadan path: data/fud/crubadan.parquet - config_name: fuf data_files: - split: crubadan path: data/fuf/crubadan.parquet - config_name: fur data_files: - split: crubadan path: data/fur/crubadan.parquet - config_name: fy data_files: - split: crubadan path: data/fy/crubadan.parquet - config_name: gaa data_files: - split: crubadan path: data/gaa/crubadan.parquet - config_name: ga data_files: - split: crubadan path: data/ga/crubadan.parquet - config_name: gag data_files: - split: crubadan path: data/gag/crubadan.parquet - config_name: gba data_files: - split: crubadan path: data/gba/crubadan.parquet - config_name: gd data_files: - split: crubadan path: data/gd/crubadan.parquet - config_name: gil data_files: - split: crubadan path: data/gil/crubadan.parquet - config_name: gjn data_files: - split: crubadan path: data/gjn/crubadan.parquet - config_name: gkn data_files: - split: crubadan path: data/gkn/crubadan.parquet - config_name: gl data_files: - split: crubadan path: data/gl/crubadan.parquet - config_name: gn data_files: - split: crubadan path: data/gn/crubadan.parquet - config_name: got data_files: - split: crubadan path: data/got/crubadan.parquet - config_name: gsc data_files: - split: crubadan path: data/gsc/crubadan.parquet - config_name: gsw data_files: - split: crubadan path: data/gsw/crubadan.parquet - config_name: guc data_files: - split: crubadan path: data/guc/crubadan.parquet - config_name: gu data_files: - split: crubadan path: data/gu/crubadan.parquet - config_name: guw data_files: - split: crubadan path: data/guw/crubadan.parquet - config_name: gv data_files: - split: crubadan path: data/gv/crubadan.parquet - config_name: gym data_files: - split: crubadan path: data/gym/crubadan.parquet - config_name: ha data_files: - split: crubadan path: data/ha/crubadan.parquet - config_name: haw data_files: - split: crubadan path: data/haw/crubadan.parquet - config_name: he data_files: - split: crubadan path: data/he/crubadan.parquet - config_name: hi data_files: - split: crubadan path: data/hi/crubadan.parquet - config_name: hil data_files: - split: crubadan path: data/hil/crubadan.parquet - config_name: hna data_files: - split: crubadan path: data/hna/crubadan.parquet - config_name: hne data_files: - split: crubadan path: data/hne/crubadan.parquet - config_name: hni data_files: - split: crubadan path: data/hni/crubadan.parquet - config_name: ho data_files: - split: crubadan path: data/ho/crubadan.parquet - config_name: hr data_files: - split: crubadan path: data/hr/crubadan.parquet - config_name: hsb data_files: - split: crubadan path: data/hsb/crubadan.parquet - config_name: ht data_files: - split: crubadan path: data/ht/crubadan.parquet - config_name: hu data_files: - split: crubadan path: data/hu/crubadan.parquet - config_name: huu data_files: - split: crubadan path: data/huu/crubadan.parquet - config_name: hve data_files: - split: crubadan path: data/hve/crubadan.parquet - config_name: hy data_files: - split: crubadan path: data/hy/crubadan.parquet - config_name: hz data_files: - split: crubadan path: data/hz/crubadan.parquet - config_name: ia data_files: - split: crubadan path: data/ia/crubadan.parquet - config_name: iba data_files: - split: crubadan path: data/iba/crubadan.parquet - config_name: id data_files: - split: crubadan path: data/id/crubadan.parquet - config_name: ig data_files: - split: crubadan path: data/ig/crubadan.parquet - config_name: igl data_files: - split: crubadan path: data/igl/crubadan.parquet - config_name: ilo data_files: - split: crubadan path: data/ilo/crubadan.parquet - config_name: inh data_files: - split: crubadan path: data/inh/crubadan.parquet - config_name: is data_files: - split: crubadan path: data/is/crubadan.parquet - config_name: iso data_files: - split: crubadan path: data/iso/crubadan.parquet - config_name: it data_files: - split: crubadan path: data/it/crubadan.parquet - config_name: its data_files: - split: crubadan path: data/its/crubadan.parquet - config_name: iu data_files: - split: crubadan path: data/iu/crubadan.parquet - config_name: ivv data_files: - split: crubadan path: data/ivv/crubadan.parquet - config_name: jiv data_files: - split: crubadan path: data/jiv/crubadan.parquet - config_name: jv data_files: - split: crubadan path: data/jv/crubadan.parquet - config_name: kab data_files: - split: crubadan path: data/kab/crubadan.parquet - config_name: kac data_files: - split: crubadan path: data/kac/crubadan.parquet - config_name: ka data_files: - split: crubadan path: data/ka/crubadan.parquet - config_name: kam data_files: - split: crubadan path: data/kam/crubadan.parquet - config_name: kbd data_files: - split: crubadan path: data/kbd/crubadan.parquet - config_name: kbp data_files: - split: crubadan path: data/kbp/crubadan.parquet - config_name: kcc data_files: - split: crubadan path: data/kcc/crubadan.parquet - config_name: kck data_files: - split: crubadan path: data/kck/crubadan.parquet - config_name: kde data_files: - split: crubadan path: data/kde/crubadan.parquet - config_name: kek data_files: - split: crubadan path: data/kek/crubadan.parquet - config_name: kg data_files: - split: crubadan path: data/kg/crubadan.parquet - config_name: kha data_files: - split: crubadan path: data/kha/crubadan.parquet - config_name: ki data_files: - split: crubadan path: data/ki/crubadan.parquet - config_name: kj data_files: - split: crubadan path: data/kj/crubadan.parquet - config_name: kjh data_files: - split: crubadan path: data/kjh/crubadan.parquet - config_name: kk data_files: - split: crubadan path: data/kk/crubadan.parquet - config_name: kl data_files: - split: crubadan path: data/kl/crubadan.parquet - config_name: kmb data_files: - split: crubadan path: data/kmb/crubadan.parquet - config_name: km data_files: - split: crubadan path: data/km/crubadan.parquet - config_name: kn data_files: - split: crubadan path: data/kn/crubadan.parquet - config_name: kok data_files: - split: crubadan path: data/kok/crubadan.parquet - config_name: koo data_files: - split: crubadan path: data/koo/crubadan.parquet - config_name: kos data_files: - split: crubadan path: data/kos/crubadan.parquet - config_name: kpe data_files: - split: crubadan path: data/kpe/crubadan.parquet - config_name: kqn data_files: - split: crubadan path: data/kqn/crubadan.parquet - config_name: krc data_files: - split: crubadan path: data/krc/crubadan.parquet - config_name: kr data_files: - split: crubadan path: data/kr/crubadan.parquet - config_name: kri data_files: - split: crubadan path: data/kri/crubadan.parquet - config_name: ksh data_files: - split: crubadan path: data/ksh/crubadan.parquet - config_name: ktu data_files: - split: crubadan path: data/ktu/crubadan.parquet - config_name: ku data_files: - split: crubadan path: data/ku/crubadan.parquet - config_name: kum data_files: - split: crubadan path: data/kum/crubadan.parquet - config_name: kv data_files: - split: crubadan path: data/kv/crubadan.parquet - config_name: kwf data_files: - split: crubadan path: data/kwf/crubadan.parquet - config_name: kwk data_files: - split: crubadan path: data/kwk/crubadan.parquet - config_name: kwm data_files: - split: crubadan path: data/kwm/crubadan.parquet - config_name: kwn data_files: - split: crubadan path: data/kwn/crubadan.parquet - config_name: kwu data_files: - split: crubadan path: data/kwu/crubadan.parquet - config_name: ky data_files: - split: crubadan path: data/ky/crubadan.parquet - config_name: lad data_files: - split: crubadan path: data/lad/crubadan.parquet - config_name: la data_files: - split: crubadan path: data/la/crubadan.parquet - config_name: lbe data_files: - split: crubadan path: data/lbe/crubadan.parquet - config_name: lb data_files: - split: crubadan path: data/lb/crubadan.parquet - config_name: lch data_files: - split: crubadan path: data/lch/crubadan.parquet - config_name: lg data_files: - split: crubadan path: data/lg/crubadan.parquet - config_name: lgg data_files: - split: crubadan path: data/lgg/crubadan.parquet - config_name: lia data_files: - split: crubadan path: data/lia/crubadan.parquet - config_name: li data_files: - split: crubadan path: data/li/crubadan.parquet - config_name: lij data_files: - split: crubadan path: data/lij/crubadan.parquet - config_name: lld data_files: - split: crubadan path: data/lld/crubadan.parquet - config_name: llh data_files: - split: crubadan path: data/llh/crubadan.parquet - config_name: llj data_files: - split: crubadan path: data/llj/crubadan.parquet - config_name: llr data_files: - split: crubadan path: data/llr/crubadan.parquet - config_name: lmo data_files: - split: crubadan path: data/lmo/crubadan.parquet - config_name: lms data_files: - split: crubadan path: data/lms/crubadan.parquet - config_name: lnc data_files: - split: crubadan path: data/lnc/crubadan.parquet - config_name: ln data_files: - split: crubadan path: data/ln/crubadan.parquet - config_name: lns data_files: - split: crubadan path: data/lns/crubadan.parquet - config_name: lo data_files: - split: crubadan path: data/lo/crubadan.parquet - config_name: lol data_files: - split: crubadan path: data/lol/crubadan.parquet - config_name: loz data_files: - split: crubadan path: data/loz/crubadan.parquet - config_name: lt data_files: - split: crubadan path: data/lt/crubadan.parquet - config_name: lua data_files: - split: crubadan path: data/lua/crubadan.parquet - config_name: lue data_files: - split: crubadan path: data/lue/crubadan.parquet - config_name: lu data_files: - split: crubadan path: data/lu/crubadan.parquet - config_name: lun data_files: - split: crubadan path: data/lun/crubadan.parquet - config_name: luo data_files: - split: crubadan path: data/luo/crubadan.parquet - config_name: lus data_files: - split: crubadan path: data/lus/crubadan.parquet - config_name: lv data_files: - split: crubadan path: data/lv/crubadan.parquet - config_name: mad data_files: - split: crubadan path: data/mad/crubadan.parquet - config_name: mam data_files: - split: crubadan path: data/mam/crubadan.parquet - config_name: mau data_files: - split: crubadan path: data/mau/crubadan.parquet - config_name: maz data_files: - split: crubadan path: data/maz/crubadan.parquet - config_name: mcd data_files: - split: crubadan path: data/mcd/crubadan.parquet - config_name: mcf data_files: - split: crubadan path: data/mcf/crubadan.parquet - config_name: mdf data_files: - split: crubadan path: data/mdf/crubadan.parquet - config_name: men data_files: - split: crubadan path: data/men/crubadan.parquet - config_name: meu data_files: - split: crubadan path: data/meu/crubadan.parquet - config_name: mfe data_files: - split: crubadan path: data/mfe/crubadan.parquet - config_name: mg data_files: - split: crubadan path: data/mg/crubadan.parquet - config_name: mh data_files: - split: crubadan path: data/mh/crubadan.parquet - config_name: mhi data_files: - split: crubadan path: data/mhi/crubadan.parquet - config_name: mho data_files: - split: crubadan path: data/mho/crubadan.parquet - config_name: mic data_files: - split: crubadan path: data/mic/crubadan.parquet - config_name: mi data_files: - split: crubadan path: data/mi/crubadan.parquet - config_name: min data_files: - split: crubadan path: data/min/crubadan.parquet - config_name: miq data_files: - split: crubadan path: data/miq/crubadan.parquet - config_name: mir data_files: - split: crubadan path: data/mir/crubadan.parquet - config_name: mk data_files: - split: crubadan path: data/mk/crubadan.parquet - config_name: ml data_files: - split: crubadan path: data/ml/crubadan.parquet - config_name: mlu data_files: - split: crubadan path: data/mlu/crubadan.parquet - config_name: mn data_files: - split: crubadan path: data/mn/crubadan.parquet - config_name: mo data_files: - split: crubadan path: data/mo/crubadan.parquet - config_name: mos data_files: - split: crubadan path: data/mos/crubadan.parquet - config_name: mr data_files: - split: crubadan path: data/mr/crubadan.parquet - config_name: mrj data_files: - split: crubadan path: data/mrj/crubadan.parquet - config_name: ms data_files: - split: crubadan path: data/ms/crubadan.parquet - config_name: mt data_files: - split: crubadan path: data/mt/crubadan.parquet - config_name: mua data_files: - split: crubadan path: data/mua/crubadan.parquet - config_name: mus data_files: - split: crubadan path: data/mus/crubadan.parquet - config_name: mwv data_files: - split: crubadan path: data/mwv/crubadan.parquet - config_name: mxv data_files: - split: crubadan path: data/mxv/crubadan.parquet - config_name: my data_files: - split: crubadan path: data/my/crubadan.parquet - config_name: myv data_files: - split: crubadan path: data/myv/crubadan.parquet - config_name: mzn data_files: - split: crubadan path: data/mzn/crubadan.parquet - config_name: na data_files: - split: crubadan path: data/na/crubadan.parquet - config_name: nah data_files: - split: crubadan path: data/nah/crubadan.parquet - config_name: nap data_files: - split: crubadan path: data/nap/crubadan.parquet - config_name: naq data_files: - split: crubadan path: data/naq/crubadan.parquet - config_name: nba data_files: - split: crubadan path: data/nba/crubadan.parquet - config_name: nb data_files: - split: crubadan path: data/nb/crubadan.parquet - config_name: ndc data_files: - split: crubadan path: data/ndc/crubadan.parquet - config_name: nd data_files: - split: crubadan path: data/nd/crubadan.parquet - config_name: nds data_files: - split: crubadan path: data/nds/crubadan.parquet - config_name: ne data_files: - split: crubadan path: data/ne/crubadan.parquet - config_name: nen data_files: - split: crubadan path: data/nen/crubadan.parquet - config_name: ng data_files: - split: crubadan path: data/ng/crubadan.parquet - config_name: ngl data_files: - split: crubadan path: data/ngl/crubadan.parquet - config_name: nia data_files: - split: crubadan path: data/nia/crubadan.parquet - config_name: niu data_files: - split: crubadan path: data/niu/crubadan.parquet - config_name: nl data_files: - split: crubadan path: data/nl/crubadan.parquet - config_name: nmf data_files: - split: crubadan path: data/nmf/crubadan.parquet - config_name: nnb data_files: - split: crubadan path: data/nnb/crubadan.parquet - config_name: nn data_files: - split: crubadan path: data/nn/crubadan.parquet - config_name: not data_files: - split: crubadan path: data/not/crubadan.parquet - config_name: nr data_files: - split: crubadan path: data/nr/crubadan.parquet - config_name: nso data_files: - split: crubadan path: data/nso/crubadan.parquet - config_name: nv data_files: - split: crubadan path: data/nv/crubadan.parquet - config_name: ny data_files: - split: crubadan path: data/ny/crubadan.parquet - config_name: nyk data_files: - split: crubadan path: data/nyk/crubadan.parquet - config_name: nym data_files: - split: crubadan path: data/nym/crubadan.parquet - config_name: nyn data_files: - split: crubadan path: data/nyn/crubadan.parquet - config_name: nzi data_files: - split: crubadan path: data/nzi/crubadan.parquet - config_name: ogo data_files: - split: crubadan path: data/ogo/crubadan.parquet - config_name: oj data_files: - split: crubadan path: data/oj/crubadan.parquet - config_name: om data_files: - split: crubadan path: data/om/crubadan.parquet - config_name: ood data_files: - split: crubadan path: data/ood/crubadan.parquet - config_name: or data_files: - split: crubadan path: data/or/crubadan.parquet - config_name: os data_files: - split: crubadan path: data/os/crubadan.parquet - config_name: pa data_files: - split: crubadan path: data/pa/crubadan.parquet - config_name: pag data_files: - split: crubadan path: data/pag/crubadan.parquet - config_name: pam data_files: - split: crubadan path: data/pam/crubadan.parquet - config_name: pap data_files: - split: crubadan path: data/pap/crubadan.parquet - config_name: pau data_files: - split: crubadan path: data/pau/crubadan.parquet - config_name: pbb data_files: - split: crubadan path: data/pbb/crubadan.parquet - config_name: pcm data_files: - split: crubadan path: data/pcm/crubadan.parquet - config_name: pdc data_files: - split: crubadan path: data/pdc/crubadan.parquet - config_name: pem data_files: - split: crubadan path: data/pem/crubadan.parquet - config_name: pih data_files: - split: crubadan path: data/pih/crubadan.parquet - config_name: pis data_files: - split: crubadan path: data/pis/crubadan.parquet - config_name: pl data_files: - split: crubadan path: data/pl/crubadan.parquet - config_name: pms data_files: - split: crubadan path: data/pms/crubadan.parquet - config_name: pon data_files: - split: crubadan path: data/pon/crubadan.parquet - config_name: ppl data_files: - split: crubadan path: data/ppl/crubadan.parquet - config_name: prq data_files: - split: crubadan path: data/prq/crubadan.parquet - config_name: prs data_files: - split: crubadan path: data/prs/crubadan.parquet - config_name: prv data_files: - split: crubadan path: data/prv/crubadan.parquet - config_name: ps data_files: - split: crubadan path: data/ps/crubadan.parquet - config_name: ptb data_files: - split: crubadan path: data/ptb/crubadan.parquet - config_name: pt data_files: - split: crubadan path: data/pt/crubadan.parquet - config_name: qu data_files: - split: crubadan path: data/qu/crubadan.parquet - config_name: qug data_files: - split: crubadan path: data/qug/crubadan.parquet - config_name: rar data_files: - split: crubadan path: data/rar/crubadan.parquet - config_name: rcf data_files: - split: crubadan path: data/rcf/crubadan.parquet - config_name: rm data_files: - split: crubadan path: data/rm/crubadan.parquet - config_name: rnd data_files: - split: crubadan path: data/rnd/crubadan.parquet - config_name: rn data_files: - split: crubadan path: data/rn/crubadan.parquet - config_name: ro data_files: - split: crubadan path: data/ro/crubadan.parquet - config_name: rom data_files: - split: crubadan path: data/rom/crubadan.parquet - config_name: ru data_files: - split: crubadan path: data/ru/crubadan.parquet - config_name: rug data_files: - split: crubadan path: data/rug/crubadan.parquet - config_name: rup data_files: - split: crubadan path: data/rup/crubadan.parquet - config_name: rw data_files: - split: crubadan path: data/rw/crubadan.parquet - config_name: sba data_files: - split: crubadan path: data/sba/crubadan.parquet - config_name: sc data_files: - split: crubadan path: data/sc/crubadan.parquet - config_name: scn data_files: - split: crubadan path: data/scn/crubadan.parquet - config_name: sco data_files: - split: crubadan path: data/sco/crubadan.parquet - config_name: sd data_files: - split: crubadan path: data/sd/crubadan.parquet - config_name: se data_files: - split: crubadan path: data/se/crubadan.parquet - config_name: seh data_files: - split: crubadan path: data/seh/crubadan.parquet - config_name: sg data_files: - split: crubadan path: data/sg/crubadan.parquet - config_name: shp data_files: - split: crubadan path: data/shp/crubadan.parquet - config_name: shs data_files: - split: crubadan path: data/shs/crubadan.parquet - config_name: sid data_files: - split: crubadan path: data/sid/crubadan.parquet - config_name: sk data_files: - split: crubadan path: data/sk/crubadan.parquet - config_name: sl data_files: - split: crubadan path: data/sl/crubadan.parquet - config_name: sm data_files: - split: crubadan path: data/sm/crubadan.parquet - config_name: sn data_files: - split: crubadan path: data/sn/crubadan.parquet - config_name: snk data_files: - split: crubadan path: data/snk/crubadan.parquet - config_name: so data_files: - split: crubadan path: data/so/crubadan.parquet - config_name: son data_files: - split: crubadan path: data/son/crubadan.parquet - config_name: sop data_files: - split: crubadan path: data/sop/crubadan.parquet - config_name: sq data_files: - split: crubadan path: data/sq/crubadan.parquet - config_name: srd data_files: - split: crubadan path: data/srd/crubadan.parquet - config_name: sr data_files: - split: crubadan path: data/sr/crubadan.parquet - config_name: srm data_files: - split: crubadan path: data/srm/crubadan.parquet - config_name: srn data_files: - split: crubadan path: data/srn/crubadan.parquet - config_name: srr data_files: - split: crubadan path: data/srr/crubadan.parquet - config_name: ss data_files: - split: crubadan path: data/ss/crubadan.parquet - config_name: st data_files: - split: crubadan path: data/st/crubadan.parquet - config_name: su data_files: - split: crubadan path: data/su/crubadan.parquet - config_name: suk data_files: - split: crubadan path: data/suk/crubadan.parquet - config_name: sum data_files: - split: crubadan path: data/sum/crubadan.parquet - config_name: sus data_files: - split: crubadan path: data/sus/crubadan.parquet - config_name: sv data_files: - split: crubadan path: data/sv/crubadan.parquet - config_name: swb data_files: - split: crubadan path: data/swb/crubadan.parquet - config_name: sw data_files: - split: crubadan path: data/sw/crubadan.parquet - config_name: tab data_files: - split: crubadan path: data/tab/crubadan.parquet - config_name: ta data_files: - split: crubadan path: data/ta/crubadan.parquet - config_name: tbz data_files: - split: crubadan path: data/tbz/crubadan.parquet - config_name: te data_files: - split: crubadan path: data/te/crubadan.parquet - config_name: tem data_files: - split: crubadan path: data/tem/crubadan.parquet - config_name: teo data_files: - split: crubadan path: data/teo/crubadan.parquet - config_name: tet data_files: - split: crubadan path: data/tet/crubadan.parquet - config_name: tg data_files: - split: crubadan path: data/tg/crubadan.parquet - config_name: th data_files: - split: crubadan path: data/th/crubadan.parquet - config_name: ti data_files: - split: crubadan path: data/ti/crubadan.parquet - config_name: tig data_files: - split: crubadan path: data/tig/crubadan.parquet - config_name: tiv data_files: - split: crubadan path: data/tiv/crubadan.parquet - config_name: tk data_files: - split: crubadan path: data/tk/crubadan.parquet - config_name: tkl data_files: - split: crubadan path: data/tkl/crubadan.parquet - config_name: tl data_files: - split: crubadan path: data/tl/crubadan.parquet - config_name: tll data_files: - split: crubadan path: data/tll/crubadan.parquet - config_name: tn data_files: - split: crubadan path: data/tn/crubadan.parquet - config_name: tob data_files: - split: crubadan path: data/tob/crubadan.parquet - config_name: to data_files: - split: crubadan path: data/to/crubadan.parquet - config_name: toi data_files: - split: crubadan path: data/toi/crubadan.parquet - config_name: toj data_files: - split: crubadan path: data/toj/crubadan.parquet - config_name: tos data_files: - split: crubadan path: data/tos/crubadan.parquet - config_name: tpi data_files: - split: crubadan path: data/tpi/crubadan.parquet - config_name: tr data_files: - split: crubadan path: data/tr/crubadan.parquet - config_name: tsc data_files: - split: crubadan path: data/tsc/crubadan.parquet - config_name: ts data_files: - split: crubadan path: data/ts/crubadan.parquet - config_name: tt data_files: - split: crubadan path: data/tt/crubadan.parquet - config_name: ttj data_files: - split: crubadan path: data/ttj/crubadan.parquet - config_name: tum data_files: - split: crubadan path: data/tum/crubadan.parquet - config_name: tvl data_files: - split: crubadan path: data/tvl/crubadan.parquet - config_name: ty data_files: - split: crubadan path: data/ty/crubadan.parquet - config_name: tzc data_files: - split: crubadan path: data/tzc/crubadan.parquet - config_name: tzm data_files: - split: crubadan path: data/tzm/crubadan.parquet - config_name: udm data_files: - split: crubadan path: data/udm/crubadan.parquet - config_name: ug data_files: - split: crubadan path: data/ug/crubadan.parquet - config_name: uk data_files: - split: crubadan path: data/uk/crubadan.parquet - config_name: umb data_files: - split: crubadan path: data/umb/crubadan.parquet - config_name: ura data_files: - split: crubadan path: data/ura/crubadan.parquet - config_name: ur data_files: - split: crubadan path: data/ur/crubadan.parquet - config_name: urh data_files: - split: crubadan path: data/urh/crubadan.parquet - config_name: uz data_files: - split: crubadan path: data/uz/crubadan.parquet - config_name: val data_files: - split: crubadan path: data/val/crubadan.parquet - config_name: vec data_files: - split: crubadan path: data/vec/crubadan.parquet - config_name: ve data_files: - split: crubadan path: data/ve/crubadan.parquet - config_name: vi data_files: - split: crubadan path: data/vi/crubadan.parquet - config_name: vls data_files: - split: crubadan path: data/vls/crubadan.parquet - config_name: vmf data_files: - split: crubadan path: data/vmf/crubadan.parquet - config_name: vmw data_files: - split: crubadan path: data/vmw/crubadan.parquet - config_name: wa data_files: - split: crubadan path: data/wa/crubadan.parquet - config_name: wal data_files: - split: crubadan path: data/wal/crubadan.parquet - config_name: war data_files: - split: crubadan path: data/war/crubadan.parquet - config_name: wls data_files: - split: crubadan path: data/wls/crubadan.parquet - config_name: wo data_files: - split: crubadan path: data/wo/crubadan.parquet - config_name: xal data_files: - split: crubadan path: data/xal/crubadan.parquet - config_name: xh data_files: - split: crubadan path: data/xh/crubadan.parquet - config_name: xsm data_files: - split: crubadan path: data/xsm/crubadan.parquet - config_name: yad data_files: - split: crubadan path: data/yad/crubadan.parquet - config_name: yaf data_files: - split: crubadan path: data/yaf/crubadan.parquet - config_name: yao data_files: - split: crubadan path: data/yao/crubadan.parquet - config_name: yap data_files: - split: crubadan path: data/yap/crubadan.parquet - config_name: yi data_files: - split: crubadan path: data/yi/crubadan.parquet - config_name: yo data_files: - split: crubadan path: data/yo/crubadan.parquet - config_name: yua data_files: - split: crubadan path: data/yua/crubadan.parquet - config_name: za data_files: - split: crubadan path: data/za/crubadan.parquet - config_name: zap data_files: - split: crubadan path: data/zap/crubadan.parquet - config_name: zea data_files: - split: crubadan path: data/zea/crubadan.parquet - config_name: zh data_files: - split: crubadan path: data/zh/crubadan.parquet - config_name: znd data_files: - split: crubadan path: data/znd/crubadan.parquet - config_name: zpa data_files: - split: crubadan path: data/zpa/crubadan.parquet - config_name: zu data_files: - split: crubadan path: data/zu/crubadan.parquet license: gpl-3.0 task_categories: - text-classification - token-classification pretty_name: NLTK Crúbadán Language ID Corpus --- # NLTK Crúbadán Language ID Corpus Character 3-gram frequency tables for **449 writing systems**, collected by Kevin Scannell's [An Crúbadán](http://borel.slu.edu/crubadan/) web crawler (2010). Distributed via [NLTK](https://www.nltk.org/). Trigrams use `<` (word start) and `>` (word end) as boundary markers. ## Configs | Config | Description | Schema | |---|---|---| | `table` | Language metadata | `crubadan_code, iso639_3, language_name` | | `{lang_code}` | Per-language trigrams | `count, trigram` | All 449 language codes: `ab`, `abn`, `ace`, `ach`, `acu`, `ada`, `af`, `agr`, `aja`, `ak`, `ako`, `alt`, `amc`, `ame`, `am`, `ami`, `amr`, `an`, `ang`, `ar`, … (and 429 more) ## Schema **`table`** | Column | Type | Description | |---|---|---| | `crubadan_code` | string | Internal Crúbadán writing-system code | | `iso639_3` | string | ISO 639-3 language code | | `language_name` | string | English language name | **`{lang_code}`** — one config per writing system | Column | Type | Description | |---|---|---| | `count` | int64 | Frequency of trigram in crawled text | | `trigram` | string | 3-character sequence (`<`/`>` = word boundaries) | Rows are sorted by descending count (most frequent first). ## Sample languages | Code | ISO 639-3 | Language | |---|---|---| | `ab` | `abk` | Abkhaz | | `abn` | `abn` | Abua | | `ace` | `ace` | Aceh | | `ach` | `ach` | Acholi | | `acu` | `acu` | Achuar-Shiwiar | | `ada` | `ada` | Dangme | | `af` | `afr` | Afrikaans | | `agr` | `agr` | Aguaruna | | `aja` | `aja` | Aja | | `ak` | `aka` | Akan | ## Usage ```python from datasets import load_dataset # Language metadata meta = load_dataset("nltk-data-hub/crubadan", "table") df = meta["crubadan"].to_pandas() # Trigrams for a specific language ds = load_dataset("nltk-data-hub/crubadan", "af") # Afrikaans trigrams = ds["crubadan"].to_pandas() # count, trigram columns ``` ## Via NLTK ```python import nltk nltk.download("crubadan") reader = nltk.corpus.crubadan reader.lang_codes() # list all 449 codes reader.trigrams("af") # Afrikaans trigrams reader.iso_lang_code("af") # → 'afr' reader.lang_name("af") # → 'Afrikaans' ``` ## License GPL v3 — © 2010 Kevin P. Scannell. See [GNU GPL v3](https://www.gnu.org/licenses/gpl-3.0.html). ## Citation ```bibtex @inproceedings{crubadan, author = {Scannell, Kevin P.}, title = {The Crúbadán Project: Corpus building for under-resourced languages}, booktitle = {Building and Exploring Web Corpora: Proceedings of the 3rd Web as Corpus Workshop}, year = {2007}, pages = {5--15}, url = {http://borel.slu.edu/crubadan/} } ```
提供机构:
nltk-data-hub
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作