nltk-data-hub/crubadan
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/nltk-data-hub/crubadan
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: table
data_files:
- split: crubadan
path: data/table/crubadan.parquet
- config_name: ab
data_files:
- split: crubadan
path: data/ab/crubadan.parquet
- config_name: abn
data_files:
- split: crubadan
path: data/abn/crubadan.parquet
- config_name: ace
data_files:
- split: crubadan
path: data/ace/crubadan.parquet
- config_name: ach
data_files:
- split: crubadan
path: data/ach/crubadan.parquet
- config_name: acu
data_files:
- split: crubadan
path: data/acu/crubadan.parquet
- config_name: ada
data_files:
- split: crubadan
path: data/ada/crubadan.parquet
- config_name: af
data_files:
- split: crubadan
path: data/af/crubadan.parquet
- config_name: agr
data_files:
- split: crubadan
path: data/agr/crubadan.parquet
- config_name: aja
data_files:
- split: crubadan
path: data/aja/crubadan.parquet
- config_name: ak
data_files:
- split: crubadan
path: data/ak/crubadan.parquet
- config_name: ako
data_files:
- split: crubadan
path: data/ako/crubadan.parquet
- config_name: alt
data_files:
- split: crubadan
path: data/alt/crubadan.parquet
- config_name: amc
data_files:
- split: crubadan
path: data/amc/crubadan.parquet
- config_name: ame
data_files:
- split: crubadan
path: data/ame/crubadan.parquet
- config_name: am
data_files:
- split: crubadan
path: data/am/crubadan.parquet
- config_name: ami
data_files:
- split: crubadan
path: data/ami/crubadan.parquet
- config_name: amr
data_files:
- split: crubadan
path: data/amr/crubadan.parquet
- config_name: an
data_files:
- split: crubadan
path: data/an/crubadan.parquet
- config_name: ang
data_files:
- split: crubadan
path: data/ang/crubadan.parquet
- config_name: ar
data_files:
- split: crubadan
path: data/ar/crubadan.parquet
- config_name: arl
data_files:
- split: crubadan
path: data/arl/crubadan.parquet
- config_name: arn
data_files:
- split: crubadan
path: data/arn/crubadan.parquet
- config_name: as
data_files:
- split: crubadan
path: data/as/crubadan.parquet
- config_name: ast
data_files:
- split: crubadan
path: data/ast/crubadan.parquet
- config_name: av
data_files:
- split: crubadan
path: data/av/crubadan.parquet
- config_name: ay
data_files:
- split: crubadan
path: data/ay/crubadan.parquet
- config_name: az
data_files:
- split: crubadan
path: data/az/crubadan.parquet
- config_name: ba
data_files:
- split: crubadan
path: data/ba/crubadan.parquet
- config_name: bal
data_files:
- split: crubadan
path: data/bal/crubadan.parquet
- config_name: ban
data_files:
- split: crubadan
path: data/ban/crubadan.parquet
- config_name: bar
data_files:
- split: crubadan
path: data/bar/crubadan.parquet
- config_name: bas
data_files:
- split: crubadan
path: data/bas/crubadan.parquet
- config_name: bba
data_files:
- split: crubadan
path: data/bba/crubadan.parquet
- config_name: bci
data_files:
- split: crubadan
path: data/bci/crubadan.parquet
- config_name: be
data_files:
- split: crubadan
path: data/be/crubadan.parquet
- config_name: bem
data_files:
- split: crubadan
path: data/bem/crubadan.parquet
- config_name: bfa
data_files:
- split: crubadan
path: data/bfa/crubadan.parquet
- config_name: bg
data_files:
- split: crubadan
path: data/bg/crubadan.parquet
- config_name: bh
data_files:
- split: crubadan
path: data/bh/crubadan.parquet
- config_name: bi
data_files:
- split: crubadan
path: data/bi/crubadan.parquet
- config_name: bik
data_files:
- split: crubadan
path: data/bik/crubadan.parquet
- config_name: bin
data_files:
- split: crubadan
path: data/bin/crubadan.parquet
- config_name: bm
data_files:
- split: crubadan
path: data/bm/crubadan.parquet
- config_name: bn
data_files:
- split: crubadan
path: data/bn/crubadan.parquet
- config_name: boa
data_files:
- split: crubadan
path: data/boa/crubadan.parquet
- config_name: bo
data_files:
- split: crubadan
path: data/bo/crubadan.parquet
- config_name: br
data_files:
- split: crubadan
path: data/br/crubadan.parquet
- config_name: bs
data_files:
- split: crubadan
path: data/bs/crubadan.parquet
- config_name: btb
data_files:
- split: crubadan
path: data/btb/crubadan.parquet
- config_name: bua
data_files:
- split: crubadan
path: data/bua/crubadan.parquet
- config_name: buc
data_files:
- split: crubadan
path: data/buc/crubadan.parquet
- config_name: bug
data_files:
- split: crubadan
path: data/bug/crubadan.parquet
- config_name: bum
data_files:
- split: crubadan
path: data/bum/crubadan.parquet
- config_name: byv
data_files:
- split: crubadan
path: data/byv/crubadan.parquet
- config_name: cab
data_files:
- split: crubadan
path: data/cab/crubadan.parquet
- config_name: ca
data_files:
- split: crubadan
path: data/ca/crubadan.parquet
- config_name: cak
data_files:
- split: crubadan
path: data/cak/crubadan.parquet
- config_name: cbr
data_files:
- split: crubadan
path: data/cbr/crubadan.parquet
- config_name: cbs
data_files:
- split: crubadan
path: data/cbs/crubadan.parquet
- config_name: cbt
data_files:
- split: crubadan
path: data/cbt/crubadan.parquet
- config_name: cbu
data_files:
- split: crubadan
path: data/cbu/crubadan.parquet
- config_name: ceb
data_files:
- split: crubadan
path: data/ceb/crubadan.parquet
- config_name: ch
data_files:
- split: crubadan
path: data/ch/crubadan.parquet
- config_name: chj
data_files:
- split: crubadan
path: data/chj/crubadan.parquet
- config_name: chk
data_files:
- split: crubadan
path: data/chk/crubadan.parquet
- config_name: chw
data_files:
- split: crubadan
path: data/chw/crubadan.parquet
- config_name: cic
data_files:
- split: crubadan
path: data/cic/crubadan.parquet
- config_name: cjk
data_files:
- split: crubadan
path: data/cjk/crubadan.parquet
- config_name: cnh
data_files:
- split: crubadan
path: data/cnh/crubadan.parquet
- config_name: cni
data_files:
- split: crubadan
path: data/cni/crubadan.parquet
- config_name: co
data_files:
- split: crubadan
path: data/co/crubadan.parquet
- config_name: cop
data_files:
- split: crubadan
path: data/cop/crubadan.parquet
- config_name: cot
data_files:
- split: crubadan
path: data/cot/crubadan.parquet
- config_name: cpu
data_files:
- split: crubadan
path: data/cpu/crubadan.parquet
- config_name: cr
data_files:
- split: crubadan
path: data/cr/crubadan.parquet
- config_name: crs
data_files:
- split: crubadan
path: data/crs/crubadan.parquet
- config_name: csa
data_files:
- split: crubadan
path: data/csa/crubadan.parquet
- config_name: csb
data_files:
- split: crubadan
path: data/csb/crubadan.parquet
- config_name: cs
data_files:
- split: crubadan
path: data/cs/crubadan.parquet
- config_name: cu
data_files:
- split: crubadan
path: data/cu/crubadan.parquet
- config_name: cuk
data_files:
- split: crubadan
path: data/cuk/crubadan.parquet
- config_name: cv
data_files:
- split: crubadan
path: data/cv/crubadan.parquet
- config_name: cy
data_files:
- split: crubadan
path: data/cy/crubadan.parquet
- config_name: czt
data_files:
- split: crubadan
path: data/czt/crubadan.parquet
- config_name: da
data_files:
- split: crubadan
path: data/da/crubadan.parquet
- config_name: dag
data_files:
- split: crubadan
path: data/dag/crubadan.parquet
- config_name: dar
data_files:
- split: crubadan
path: data/dar/crubadan.parquet
- config_name: ddn
data_files:
- split: crubadan
path: data/ddn/crubadan.parquet
- config_name: de
data_files:
- split: crubadan
path: data/de/crubadan.parquet
- config_name: dga
data_files:
- split: crubadan
path: data/dga/crubadan.parquet
- config_name: dhv
data_files:
- split: crubadan
path: data/dhv/crubadan.parquet
- config_name: diq
data_files:
- split: crubadan
path: data/diq/crubadan.parquet
- config_name: dsb
data_files:
- split: crubadan
path: data/dsb/crubadan.parquet
- config_name: dua
data_files:
- split: crubadan
path: data/dua/crubadan.parquet
- config_name: dyo
data_files:
- split: crubadan
path: data/dyo/crubadan.parquet
- config_name: dyu
data_files:
- split: crubadan
path: data/dyu/crubadan.parquet
- config_name: dz
data_files:
- split: crubadan
path: data/dz/crubadan.parquet
- config_name: ee
data_files:
- split: crubadan
path: data/ee/crubadan.parquet
- config_name: efi
data_files:
- split: crubadan
path: data/efi/crubadan.parquet
- config_name: el
data_files:
- split: crubadan
path: data/el/crubadan.parquet
- config_name: emk
data_files:
- split: crubadan
path: data/emk/crubadan.parquet
- config_name: eml
data_files:
- split: crubadan
path: data/eml/crubadan.parquet
- config_name: en
data_files:
- split: crubadan
path: data/en/crubadan.parquet
- config_name: enz
data_files:
- split: crubadan
path: data/enz/crubadan.parquet
- config_name: eo
data_files:
- split: crubadan
path: data/eo/crubadan.parquet
- config_name: es
data_files:
- split: crubadan
path: data/es/crubadan.parquet
- config_name: et
data_files:
- split: crubadan
path: data/et/crubadan.parquet
- config_name: eu
data_files:
- split: crubadan
path: data/eu/crubadan.parquet
- config_name: fa
data_files:
- split: crubadan
path: data/fa/crubadan.parquet
- config_name: ff
data_files:
- split: crubadan
path: data/ff/crubadan.parquet
- config_name: fi
data_files:
- split: crubadan
path: data/fi/crubadan.parquet
- config_name: fj
data_files:
- split: crubadan
path: data/fj/crubadan.parquet
- config_name: fo
data_files:
- split: crubadan
path: data/fo/crubadan.parquet
- config_name: fon
data_files:
- split: crubadan
path: data/fon/crubadan.parquet
- config_name: fr
data_files:
- split: crubadan
path: data/fr/crubadan.parquet
- config_name: frf
data_files:
- split: crubadan
path: data/frf/crubadan.parquet
- config_name: frp
data_files:
- split: crubadan
path: data/frp/crubadan.parquet
- config_name: frr
data_files:
- split: crubadan
path: data/frr/crubadan.parquet
- config_name: fud
data_files:
- split: crubadan
path: data/fud/crubadan.parquet
- config_name: fuf
data_files:
- split: crubadan
path: data/fuf/crubadan.parquet
- config_name: fur
data_files:
- split: crubadan
path: data/fur/crubadan.parquet
- config_name: fy
data_files:
- split: crubadan
path: data/fy/crubadan.parquet
- config_name: gaa
data_files:
- split: crubadan
path: data/gaa/crubadan.parquet
- config_name: ga
data_files:
- split: crubadan
path: data/ga/crubadan.parquet
- config_name: gag
data_files:
- split: crubadan
path: data/gag/crubadan.parquet
- config_name: gba
data_files:
- split: crubadan
path: data/gba/crubadan.parquet
- config_name: gd
data_files:
- split: crubadan
path: data/gd/crubadan.parquet
- config_name: gil
data_files:
- split: crubadan
path: data/gil/crubadan.parquet
- config_name: gjn
data_files:
- split: crubadan
path: data/gjn/crubadan.parquet
- config_name: gkn
data_files:
- split: crubadan
path: data/gkn/crubadan.parquet
- config_name: gl
data_files:
- split: crubadan
path: data/gl/crubadan.parquet
- config_name: gn
data_files:
- split: crubadan
path: data/gn/crubadan.parquet
- config_name: got
data_files:
- split: crubadan
path: data/got/crubadan.parquet
- config_name: gsc
data_files:
- split: crubadan
path: data/gsc/crubadan.parquet
- config_name: gsw
data_files:
- split: crubadan
path: data/gsw/crubadan.parquet
- config_name: guc
data_files:
- split: crubadan
path: data/guc/crubadan.parquet
- config_name: gu
data_files:
- split: crubadan
path: data/gu/crubadan.parquet
- config_name: guw
data_files:
- split: crubadan
path: data/guw/crubadan.parquet
- config_name: gv
data_files:
- split: crubadan
path: data/gv/crubadan.parquet
- config_name: gym
data_files:
- split: crubadan
path: data/gym/crubadan.parquet
- config_name: ha
data_files:
- split: crubadan
path: data/ha/crubadan.parquet
- config_name: haw
data_files:
- split: crubadan
path: data/haw/crubadan.parquet
- config_name: he
data_files:
- split: crubadan
path: data/he/crubadan.parquet
- config_name: hi
data_files:
- split: crubadan
path: data/hi/crubadan.parquet
- config_name: hil
data_files:
- split: crubadan
path: data/hil/crubadan.parquet
- config_name: hna
data_files:
- split: crubadan
path: data/hna/crubadan.parquet
- config_name: hne
data_files:
- split: crubadan
path: data/hne/crubadan.parquet
- config_name: hni
data_files:
- split: crubadan
path: data/hni/crubadan.parquet
- config_name: ho
data_files:
- split: crubadan
path: data/ho/crubadan.parquet
- config_name: hr
data_files:
- split: crubadan
path: data/hr/crubadan.parquet
- config_name: hsb
data_files:
- split: crubadan
path: data/hsb/crubadan.parquet
- config_name: ht
data_files:
- split: crubadan
path: data/ht/crubadan.parquet
- config_name: hu
data_files:
- split: crubadan
path: data/hu/crubadan.parquet
- config_name: huu
data_files:
- split: crubadan
path: data/huu/crubadan.parquet
- config_name: hve
data_files:
- split: crubadan
path: data/hve/crubadan.parquet
- config_name: hy
data_files:
- split: crubadan
path: data/hy/crubadan.parquet
- config_name: hz
data_files:
- split: crubadan
path: data/hz/crubadan.parquet
- config_name: ia
data_files:
- split: crubadan
path: data/ia/crubadan.parquet
- config_name: iba
data_files:
- split: crubadan
path: data/iba/crubadan.parquet
- config_name: id
data_files:
- split: crubadan
path: data/id/crubadan.parquet
- config_name: ig
data_files:
- split: crubadan
path: data/ig/crubadan.parquet
- config_name: igl
data_files:
- split: crubadan
path: data/igl/crubadan.parquet
- config_name: ilo
data_files:
- split: crubadan
path: data/ilo/crubadan.parquet
- config_name: inh
data_files:
- split: crubadan
path: data/inh/crubadan.parquet
- config_name: is
data_files:
- split: crubadan
path: data/is/crubadan.parquet
- config_name: iso
data_files:
- split: crubadan
path: data/iso/crubadan.parquet
- config_name: it
data_files:
- split: crubadan
path: data/it/crubadan.parquet
- config_name: its
data_files:
- split: crubadan
path: data/its/crubadan.parquet
- config_name: iu
data_files:
- split: crubadan
path: data/iu/crubadan.parquet
- config_name: ivv
data_files:
- split: crubadan
path: data/ivv/crubadan.parquet
- config_name: jiv
data_files:
- split: crubadan
path: data/jiv/crubadan.parquet
- config_name: jv
data_files:
- split: crubadan
path: data/jv/crubadan.parquet
- config_name: kab
data_files:
- split: crubadan
path: data/kab/crubadan.parquet
- config_name: kac
data_files:
- split: crubadan
path: data/kac/crubadan.parquet
- config_name: ka
data_files:
- split: crubadan
path: data/ka/crubadan.parquet
- config_name: kam
data_files:
- split: crubadan
path: data/kam/crubadan.parquet
- config_name: kbd
data_files:
- split: crubadan
path: data/kbd/crubadan.parquet
- config_name: kbp
data_files:
- split: crubadan
path: data/kbp/crubadan.parquet
- config_name: kcc
data_files:
- split: crubadan
path: data/kcc/crubadan.parquet
- config_name: kck
data_files:
- split: crubadan
path: data/kck/crubadan.parquet
- config_name: kde
data_files:
- split: crubadan
path: data/kde/crubadan.parquet
- config_name: kek
data_files:
- split: crubadan
path: data/kek/crubadan.parquet
- config_name: kg
data_files:
- split: crubadan
path: data/kg/crubadan.parquet
- config_name: kha
data_files:
- split: crubadan
path: data/kha/crubadan.parquet
- config_name: ki
data_files:
- split: crubadan
path: data/ki/crubadan.parquet
- config_name: kj
data_files:
- split: crubadan
path: data/kj/crubadan.parquet
- config_name: kjh
data_files:
- split: crubadan
path: data/kjh/crubadan.parquet
- config_name: kk
data_files:
- split: crubadan
path: data/kk/crubadan.parquet
- config_name: kl
data_files:
- split: crubadan
path: data/kl/crubadan.parquet
- config_name: kmb
data_files:
- split: crubadan
path: data/kmb/crubadan.parquet
- config_name: km
data_files:
- split: crubadan
path: data/km/crubadan.parquet
- config_name: kn
data_files:
- split: crubadan
path: data/kn/crubadan.parquet
- config_name: kok
data_files:
- split: crubadan
path: data/kok/crubadan.parquet
- config_name: koo
data_files:
- split: crubadan
path: data/koo/crubadan.parquet
- config_name: kos
data_files:
- split: crubadan
path: data/kos/crubadan.parquet
- config_name: kpe
data_files:
- split: crubadan
path: data/kpe/crubadan.parquet
- config_name: kqn
data_files:
- split: crubadan
path: data/kqn/crubadan.parquet
- config_name: krc
data_files:
- split: crubadan
path: data/krc/crubadan.parquet
- config_name: kr
data_files:
- split: crubadan
path: data/kr/crubadan.parquet
- config_name: kri
data_files:
- split: crubadan
path: data/kri/crubadan.parquet
- config_name: ksh
data_files:
- split: crubadan
path: data/ksh/crubadan.parquet
- config_name: ktu
data_files:
- split: crubadan
path: data/ktu/crubadan.parquet
- config_name: ku
data_files:
- split: crubadan
path: data/ku/crubadan.parquet
- config_name: kum
data_files:
- split: crubadan
path: data/kum/crubadan.parquet
- config_name: kv
data_files:
- split: crubadan
path: data/kv/crubadan.parquet
- config_name: kwf
data_files:
- split: crubadan
path: data/kwf/crubadan.parquet
- config_name: kwk
data_files:
- split: crubadan
path: data/kwk/crubadan.parquet
- config_name: kwm
data_files:
- split: crubadan
path: data/kwm/crubadan.parquet
- config_name: kwn
data_files:
- split: crubadan
path: data/kwn/crubadan.parquet
- config_name: kwu
data_files:
- split: crubadan
path: data/kwu/crubadan.parquet
- config_name: ky
data_files:
- split: crubadan
path: data/ky/crubadan.parquet
- config_name: lad
data_files:
- split: crubadan
path: data/lad/crubadan.parquet
- config_name: la
data_files:
- split: crubadan
path: data/la/crubadan.parquet
- config_name: lbe
data_files:
- split: crubadan
path: data/lbe/crubadan.parquet
- config_name: lb
data_files:
- split: crubadan
path: data/lb/crubadan.parquet
- config_name: lch
data_files:
- split: crubadan
path: data/lch/crubadan.parquet
- config_name: lg
data_files:
- split: crubadan
path: data/lg/crubadan.parquet
- config_name: lgg
data_files:
- split: crubadan
path: data/lgg/crubadan.parquet
- config_name: lia
data_files:
- split: crubadan
path: data/lia/crubadan.parquet
- config_name: li
data_files:
- split: crubadan
path: data/li/crubadan.parquet
- config_name: lij
data_files:
- split: crubadan
path: data/lij/crubadan.parquet
- config_name: lld
data_files:
- split: crubadan
path: data/lld/crubadan.parquet
- config_name: llh
data_files:
- split: crubadan
path: data/llh/crubadan.parquet
- config_name: llj
data_files:
- split: crubadan
path: data/llj/crubadan.parquet
- config_name: llr
data_files:
- split: crubadan
path: data/llr/crubadan.parquet
- config_name: lmo
data_files:
- split: crubadan
path: data/lmo/crubadan.parquet
- config_name: lms
data_files:
- split: crubadan
path: data/lms/crubadan.parquet
- config_name: lnc
data_files:
- split: crubadan
path: data/lnc/crubadan.parquet
- config_name: ln
data_files:
- split: crubadan
path: data/ln/crubadan.parquet
- config_name: lns
data_files:
- split: crubadan
path: data/lns/crubadan.parquet
- config_name: lo
data_files:
- split: crubadan
path: data/lo/crubadan.parquet
- config_name: lol
data_files:
- split: crubadan
path: data/lol/crubadan.parquet
- config_name: loz
data_files:
- split: crubadan
path: data/loz/crubadan.parquet
- config_name: lt
data_files:
- split: crubadan
path: data/lt/crubadan.parquet
- config_name: lua
data_files:
- split: crubadan
path: data/lua/crubadan.parquet
- config_name: lue
data_files:
- split: crubadan
path: data/lue/crubadan.parquet
- config_name: lu
data_files:
- split: crubadan
path: data/lu/crubadan.parquet
- config_name: lun
data_files:
- split: crubadan
path: data/lun/crubadan.parquet
- config_name: luo
data_files:
- split: crubadan
path: data/luo/crubadan.parquet
- config_name: lus
data_files:
- split: crubadan
path: data/lus/crubadan.parquet
- config_name: lv
data_files:
- split: crubadan
path: data/lv/crubadan.parquet
- config_name: mad
data_files:
- split: crubadan
path: data/mad/crubadan.parquet
- config_name: mam
data_files:
- split: crubadan
path: data/mam/crubadan.parquet
- config_name: mau
data_files:
- split: crubadan
path: data/mau/crubadan.parquet
- config_name: maz
data_files:
- split: crubadan
path: data/maz/crubadan.parquet
- config_name: mcd
data_files:
- split: crubadan
path: data/mcd/crubadan.parquet
- config_name: mcf
data_files:
- split: crubadan
path: data/mcf/crubadan.parquet
- config_name: mdf
data_files:
- split: crubadan
path: data/mdf/crubadan.parquet
- config_name: men
data_files:
- split: crubadan
path: data/men/crubadan.parquet
- config_name: meu
data_files:
- split: crubadan
path: data/meu/crubadan.parquet
- config_name: mfe
data_files:
- split: crubadan
path: data/mfe/crubadan.parquet
- config_name: mg
data_files:
- split: crubadan
path: data/mg/crubadan.parquet
- config_name: mh
data_files:
- split: crubadan
path: data/mh/crubadan.parquet
- config_name: mhi
data_files:
- split: crubadan
path: data/mhi/crubadan.parquet
- config_name: mho
data_files:
- split: crubadan
path: data/mho/crubadan.parquet
- config_name: mic
data_files:
- split: crubadan
path: data/mic/crubadan.parquet
- config_name: mi
data_files:
- split: crubadan
path: data/mi/crubadan.parquet
- config_name: min
data_files:
- split: crubadan
path: data/min/crubadan.parquet
- config_name: miq
data_files:
- split: crubadan
path: data/miq/crubadan.parquet
- config_name: mir
data_files:
- split: crubadan
path: data/mir/crubadan.parquet
- config_name: mk
data_files:
- split: crubadan
path: data/mk/crubadan.parquet
- config_name: ml
data_files:
- split: crubadan
path: data/ml/crubadan.parquet
- config_name: mlu
data_files:
- split: crubadan
path: data/mlu/crubadan.parquet
- config_name: mn
data_files:
- split: crubadan
path: data/mn/crubadan.parquet
- config_name: mo
data_files:
- split: crubadan
path: data/mo/crubadan.parquet
- config_name: mos
data_files:
- split: crubadan
path: data/mos/crubadan.parquet
- config_name: mr
data_files:
- split: crubadan
path: data/mr/crubadan.parquet
- config_name: mrj
data_files:
- split: crubadan
path: data/mrj/crubadan.parquet
- config_name: ms
data_files:
- split: crubadan
path: data/ms/crubadan.parquet
- config_name: mt
data_files:
- split: crubadan
path: data/mt/crubadan.parquet
- config_name: mua
data_files:
- split: crubadan
path: data/mua/crubadan.parquet
- config_name: mus
data_files:
- split: crubadan
path: data/mus/crubadan.parquet
- config_name: mwv
data_files:
- split: crubadan
path: data/mwv/crubadan.parquet
- config_name: mxv
data_files:
- split: crubadan
path: data/mxv/crubadan.parquet
- config_name: my
data_files:
- split: crubadan
path: data/my/crubadan.parquet
- config_name: myv
data_files:
- split: crubadan
path: data/myv/crubadan.parquet
- config_name: mzn
data_files:
- split: crubadan
path: data/mzn/crubadan.parquet
- config_name: na
data_files:
- split: crubadan
path: data/na/crubadan.parquet
- config_name: nah
data_files:
- split: crubadan
path: data/nah/crubadan.parquet
- config_name: nap
data_files:
- split: crubadan
path: data/nap/crubadan.parquet
- config_name: naq
data_files:
- split: crubadan
path: data/naq/crubadan.parquet
- config_name: nba
data_files:
- split: crubadan
path: data/nba/crubadan.parquet
- config_name: nb
data_files:
- split: crubadan
path: data/nb/crubadan.parquet
- config_name: ndc
data_files:
- split: crubadan
path: data/ndc/crubadan.parquet
- config_name: nd
data_files:
- split: crubadan
path: data/nd/crubadan.parquet
- config_name: nds
data_files:
- split: crubadan
path: data/nds/crubadan.parquet
- config_name: ne
data_files:
- split: crubadan
path: data/ne/crubadan.parquet
- config_name: nen
data_files:
- split: crubadan
path: data/nen/crubadan.parquet
- config_name: ng
data_files:
- split: crubadan
path: data/ng/crubadan.parquet
- config_name: ngl
data_files:
- split: crubadan
path: data/ngl/crubadan.parquet
- config_name: nia
data_files:
- split: crubadan
path: data/nia/crubadan.parquet
- config_name: niu
data_files:
- split: crubadan
path: data/niu/crubadan.parquet
- config_name: nl
data_files:
- split: crubadan
path: data/nl/crubadan.parquet
- config_name: nmf
data_files:
- split: crubadan
path: data/nmf/crubadan.parquet
- config_name: nnb
data_files:
- split: crubadan
path: data/nnb/crubadan.parquet
- config_name: nn
data_files:
- split: crubadan
path: data/nn/crubadan.parquet
- config_name: not
data_files:
- split: crubadan
path: data/not/crubadan.parquet
- config_name: nr
data_files:
- split: crubadan
path: data/nr/crubadan.parquet
- config_name: nso
data_files:
- split: crubadan
path: data/nso/crubadan.parquet
- config_name: nv
data_files:
- split: crubadan
path: data/nv/crubadan.parquet
- config_name: ny
data_files:
- split: crubadan
path: data/ny/crubadan.parquet
- config_name: nyk
data_files:
- split: crubadan
path: data/nyk/crubadan.parquet
- config_name: nym
data_files:
- split: crubadan
path: data/nym/crubadan.parquet
- config_name: nyn
data_files:
- split: crubadan
path: data/nyn/crubadan.parquet
- config_name: nzi
data_files:
- split: crubadan
path: data/nzi/crubadan.parquet
- config_name: ogo
data_files:
- split: crubadan
path: data/ogo/crubadan.parquet
- config_name: oj
data_files:
- split: crubadan
path: data/oj/crubadan.parquet
- config_name: om
data_files:
- split: crubadan
path: data/om/crubadan.parquet
- config_name: ood
data_files:
- split: crubadan
path: data/ood/crubadan.parquet
- config_name: or
data_files:
- split: crubadan
path: data/or/crubadan.parquet
- config_name: os
data_files:
- split: crubadan
path: data/os/crubadan.parquet
- config_name: pa
data_files:
- split: crubadan
path: data/pa/crubadan.parquet
- config_name: pag
data_files:
- split: crubadan
path: data/pag/crubadan.parquet
- config_name: pam
data_files:
- split: crubadan
path: data/pam/crubadan.parquet
- config_name: pap
data_files:
- split: crubadan
path: data/pap/crubadan.parquet
- config_name: pau
data_files:
- split: crubadan
path: data/pau/crubadan.parquet
- config_name: pbb
data_files:
- split: crubadan
path: data/pbb/crubadan.parquet
- config_name: pcm
data_files:
- split: crubadan
path: data/pcm/crubadan.parquet
- config_name: pdc
data_files:
- split: crubadan
path: data/pdc/crubadan.parquet
- config_name: pem
data_files:
- split: crubadan
path: data/pem/crubadan.parquet
- config_name: pih
data_files:
- split: crubadan
path: data/pih/crubadan.parquet
- config_name: pis
data_files:
- split: crubadan
path: data/pis/crubadan.parquet
- config_name: pl
data_files:
- split: crubadan
path: data/pl/crubadan.parquet
- config_name: pms
data_files:
- split: crubadan
path: data/pms/crubadan.parquet
- config_name: pon
data_files:
- split: crubadan
path: data/pon/crubadan.parquet
- config_name: ppl
data_files:
- split: crubadan
path: data/ppl/crubadan.parquet
- config_name: prq
data_files:
- split: crubadan
path: data/prq/crubadan.parquet
- config_name: prs
data_files:
- split: crubadan
path: data/prs/crubadan.parquet
- config_name: prv
data_files:
- split: crubadan
path: data/prv/crubadan.parquet
- config_name: ps
data_files:
- split: crubadan
path: data/ps/crubadan.parquet
- config_name: ptb
data_files:
- split: crubadan
path: data/ptb/crubadan.parquet
- config_name: pt
data_files:
- split: crubadan
path: data/pt/crubadan.parquet
- config_name: qu
data_files:
- split: crubadan
path: data/qu/crubadan.parquet
- config_name: qug
data_files:
- split: crubadan
path: data/qug/crubadan.parquet
- config_name: rar
data_files:
- split: crubadan
path: data/rar/crubadan.parquet
- config_name: rcf
data_files:
- split: crubadan
path: data/rcf/crubadan.parquet
- config_name: rm
data_files:
- split: crubadan
path: data/rm/crubadan.parquet
- config_name: rnd
data_files:
- split: crubadan
path: data/rnd/crubadan.parquet
- config_name: rn
data_files:
- split: crubadan
path: data/rn/crubadan.parquet
- config_name: ro
data_files:
- split: crubadan
path: data/ro/crubadan.parquet
- config_name: rom
data_files:
- split: crubadan
path: data/rom/crubadan.parquet
- config_name: ru
data_files:
- split: crubadan
path: data/ru/crubadan.parquet
- config_name: rug
data_files:
- split: crubadan
path: data/rug/crubadan.parquet
- config_name: rup
data_files:
- split: crubadan
path: data/rup/crubadan.parquet
- config_name: rw
data_files:
- split: crubadan
path: data/rw/crubadan.parquet
- config_name: sba
data_files:
- split: crubadan
path: data/sba/crubadan.parquet
- config_name: sc
data_files:
- split: crubadan
path: data/sc/crubadan.parquet
- config_name: scn
data_files:
- split: crubadan
path: data/scn/crubadan.parquet
- config_name: sco
data_files:
- split: crubadan
path: data/sco/crubadan.parquet
- config_name: sd
data_files:
- split: crubadan
path: data/sd/crubadan.parquet
- config_name: se
data_files:
- split: crubadan
path: data/se/crubadan.parquet
- config_name: seh
data_files:
- split: crubadan
path: data/seh/crubadan.parquet
- config_name: sg
data_files:
- split: crubadan
path: data/sg/crubadan.parquet
- config_name: shp
data_files:
- split: crubadan
path: data/shp/crubadan.parquet
- config_name: shs
data_files:
- split: crubadan
path: data/shs/crubadan.parquet
- config_name: sid
data_files:
- split: crubadan
path: data/sid/crubadan.parquet
- config_name: sk
data_files:
- split: crubadan
path: data/sk/crubadan.parquet
- config_name: sl
data_files:
- split: crubadan
path: data/sl/crubadan.parquet
- config_name: sm
data_files:
- split: crubadan
path: data/sm/crubadan.parquet
- config_name: sn
data_files:
- split: crubadan
path: data/sn/crubadan.parquet
- config_name: snk
data_files:
- split: crubadan
path: data/snk/crubadan.parquet
- config_name: so
data_files:
- split: crubadan
path: data/so/crubadan.parquet
- config_name: son
data_files:
- split: crubadan
path: data/son/crubadan.parquet
- config_name: sop
data_files:
- split: crubadan
path: data/sop/crubadan.parquet
- config_name: sq
data_files:
- split: crubadan
path: data/sq/crubadan.parquet
- config_name: srd
data_files:
- split: crubadan
path: data/srd/crubadan.parquet
- config_name: sr
data_files:
- split: crubadan
path: data/sr/crubadan.parquet
- config_name: srm
data_files:
- split: crubadan
path: data/srm/crubadan.parquet
- config_name: srn
data_files:
- split: crubadan
path: data/srn/crubadan.parquet
- config_name: srr
data_files:
- split: crubadan
path: data/srr/crubadan.parquet
- config_name: ss
data_files:
- split: crubadan
path: data/ss/crubadan.parquet
- config_name: st
data_files:
- split: crubadan
path: data/st/crubadan.parquet
- config_name: su
data_files:
- split: crubadan
path: data/su/crubadan.parquet
- config_name: suk
data_files:
- split: crubadan
path: data/suk/crubadan.parquet
- config_name: sum
data_files:
- split: crubadan
path: data/sum/crubadan.parquet
- config_name: sus
data_files:
- split: crubadan
path: data/sus/crubadan.parquet
- config_name: sv
data_files:
- split: crubadan
path: data/sv/crubadan.parquet
- config_name: swb
data_files:
- split: crubadan
path: data/swb/crubadan.parquet
- config_name: sw
data_files:
- split: crubadan
path: data/sw/crubadan.parquet
- config_name: tab
data_files:
- split: crubadan
path: data/tab/crubadan.parquet
- config_name: ta
data_files:
- split: crubadan
path: data/ta/crubadan.parquet
- config_name: tbz
data_files:
- split: crubadan
path: data/tbz/crubadan.parquet
- config_name: te
data_files:
- split: crubadan
path: data/te/crubadan.parquet
- config_name: tem
data_files:
- split: crubadan
path: data/tem/crubadan.parquet
- config_name: teo
data_files:
- split: crubadan
path: data/teo/crubadan.parquet
- config_name: tet
data_files:
- split: crubadan
path: data/tet/crubadan.parquet
- config_name: tg
data_files:
- split: crubadan
path: data/tg/crubadan.parquet
- config_name: th
data_files:
- split: crubadan
path: data/th/crubadan.parquet
- config_name: ti
data_files:
- split: crubadan
path: data/ti/crubadan.parquet
- config_name: tig
data_files:
- split: crubadan
path: data/tig/crubadan.parquet
- config_name: tiv
data_files:
- split: crubadan
path: data/tiv/crubadan.parquet
- config_name: tk
data_files:
- split: crubadan
path: data/tk/crubadan.parquet
- config_name: tkl
data_files:
- split: crubadan
path: data/tkl/crubadan.parquet
- config_name: tl
data_files:
- split: crubadan
path: data/tl/crubadan.parquet
- config_name: tll
data_files:
- split: crubadan
path: data/tll/crubadan.parquet
- config_name: tn
data_files:
- split: crubadan
path: data/tn/crubadan.parquet
- config_name: tob
data_files:
- split: crubadan
path: data/tob/crubadan.parquet
- config_name: to
data_files:
- split: crubadan
path: data/to/crubadan.parquet
- config_name: toi
data_files:
- split: crubadan
path: data/toi/crubadan.parquet
- config_name: toj
data_files:
- split: crubadan
path: data/toj/crubadan.parquet
- config_name: tos
data_files:
- split: crubadan
path: data/tos/crubadan.parquet
- config_name: tpi
data_files:
- split: crubadan
path: data/tpi/crubadan.parquet
- config_name: tr
data_files:
- split: crubadan
path: data/tr/crubadan.parquet
- config_name: tsc
data_files:
- split: crubadan
path: data/tsc/crubadan.parquet
- config_name: ts
data_files:
- split: crubadan
path: data/ts/crubadan.parquet
- config_name: tt
data_files:
- split: crubadan
path: data/tt/crubadan.parquet
- config_name: ttj
data_files:
- split: crubadan
path: data/ttj/crubadan.parquet
- config_name: tum
data_files:
- split: crubadan
path: data/tum/crubadan.parquet
- config_name: tvl
data_files:
- split: crubadan
path: data/tvl/crubadan.parquet
- config_name: ty
data_files:
- split: crubadan
path: data/ty/crubadan.parquet
- config_name: tzc
data_files:
- split: crubadan
path: data/tzc/crubadan.parquet
- config_name: tzm
data_files:
- split: crubadan
path: data/tzm/crubadan.parquet
- config_name: udm
data_files:
- split: crubadan
path: data/udm/crubadan.parquet
- config_name: ug
data_files:
- split: crubadan
path: data/ug/crubadan.parquet
- config_name: uk
data_files:
- split: crubadan
path: data/uk/crubadan.parquet
- config_name: umb
data_files:
- split: crubadan
path: data/umb/crubadan.parquet
- config_name: ura
data_files:
- split: crubadan
path: data/ura/crubadan.parquet
- config_name: ur
data_files:
- split: crubadan
path: data/ur/crubadan.parquet
- config_name: urh
data_files:
- split: crubadan
path: data/urh/crubadan.parquet
- config_name: uz
data_files:
- split: crubadan
path: data/uz/crubadan.parquet
- config_name: val
data_files:
- split: crubadan
path: data/val/crubadan.parquet
- config_name: vec
data_files:
- split: crubadan
path: data/vec/crubadan.parquet
- config_name: ve
data_files:
- split: crubadan
path: data/ve/crubadan.parquet
- config_name: vi
data_files:
- split: crubadan
path: data/vi/crubadan.parquet
- config_name: vls
data_files:
- split: crubadan
path: data/vls/crubadan.parquet
- config_name: vmf
data_files:
- split: crubadan
path: data/vmf/crubadan.parquet
- config_name: vmw
data_files:
- split: crubadan
path: data/vmw/crubadan.parquet
- config_name: wa
data_files:
- split: crubadan
path: data/wa/crubadan.parquet
- config_name: wal
data_files:
- split: crubadan
path: data/wal/crubadan.parquet
- config_name: war
data_files:
- split: crubadan
path: data/war/crubadan.parquet
- config_name: wls
data_files:
- split: crubadan
path: data/wls/crubadan.parquet
- config_name: wo
data_files:
- split: crubadan
path: data/wo/crubadan.parquet
- config_name: xal
data_files:
- split: crubadan
path: data/xal/crubadan.parquet
- config_name: xh
data_files:
- split: crubadan
path: data/xh/crubadan.parquet
- config_name: xsm
data_files:
- split: crubadan
path: data/xsm/crubadan.parquet
- config_name: yad
data_files:
- split: crubadan
path: data/yad/crubadan.parquet
- config_name: yaf
data_files:
- split: crubadan
path: data/yaf/crubadan.parquet
- config_name: yao
data_files:
- split: crubadan
path: data/yao/crubadan.parquet
- config_name: yap
data_files:
- split: crubadan
path: data/yap/crubadan.parquet
- config_name: yi
data_files:
- split: crubadan
path: data/yi/crubadan.parquet
- config_name: yo
data_files:
- split: crubadan
path: data/yo/crubadan.parquet
- config_name: yua
data_files:
- split: crubadan
path: data/yua/crubadan.parquet
- config_name: za
data_files:
- split: crubadan
path: data/za/crubadan.parquet
- config_name: zap
data_files:
- split: crubadan
path: data/zap/crubadan.parquet
- config_name: zea
data_files:
- split: crubadan
path: data/zea/crubadan.parquet
- config_name: zh
data_files:
- split: crubadan
path: data/zh/crubadan.parquet
- config_name: znd
data_files:
- split: crubadan
path: data/znd/crubadan.parquet
- config_name: zpa
data_files:
- split: crubadan
path: data/zpa/crubadan.parquet
- config_name: zu
data_files:
- split: crubadan
path: data/zu/crubadan.parquet
license: gpl-3.0
task_categories:
- text-classification
- token-classification
pretty_name: NLTK Crúbadán Language ID Corpus
---
# NLTK Crúbadán Language ID Corpus
Character 3-gram frequency tables for **449 writing systems**, collected
by Kevin Scannell's [An Crúbadán](http://borel.slu.edu/crubadan/) web crawler (2010).
Distributed via [NLTK](https://www.nltk.org/).
Trigrams use `<` (word start) and `>` (word end) as boundary markers.
## Configs
| Config | Description | Schema |
|---|---|---|
| `table` | Language metadata | `crubadan_code, iso639_3, language_name` |
| `{lang_code}` | Per-language trigrams | `count, trigram` |
All 449 language codes: `ab`, `abn`, `ace`, `ach`, `acu`, `ada`, `af`, `agr`, `aja`, `ak`, `ako`, `alt`, `amc`, `ame`, `am`, `ami`, `amr`, `an`, `ang`, `ar`, … (and 429 more)
## Schema
**`table`**
| Column | Type | Description |
|---|---|---|
| `crubadan_code` | string | Internal Crúbadán writing-system code |
| `iso639_3` | string | ISO 639-3 language code |
| `language_name` | string | English language name |
**`{lang_code}`** — one config per writing system
| Column | Type | Description |
|---|---|---|
| `count` | int64 | Frequency of trigram in crawled text |
| `trigram` | string | 3-character sequence (`<`/`>` = word boundaries) |
Rows are sorted by descending count (most frequent first).
## Sample languages
| Code | ISO 639-3 | Language |
|---|---|---|
| `ab` | `abk` | Abkhaz |
| `abn` | `abn` | Abua |
| `ace` | `ace` | Aceh |
| `ach` | `ach` | Acholi |
| `acu` | `acu` | Achuar-Shiwiar |
| `ada` | `ada` | Dangme |
| `af` | `afr` | Afrikaans |
| `agr` | `agr` | Aguaruna |
| `aja` | `aja` | Aja |
| `ak` | `aka` | Akan |
## Usage
```python
from datasets import load_dataset
# Language metadata
meta = load_dataset("nltk-data-hub/crubadan", "table")
df = meta["crubadan"].to_pandas()
# Trigrams for a specific language
ds = load_dataset("nltk-data-hub/crubadan", "af") # Afrikaans
trigrams = ds["crubadan"].to_pandas() # count, trigram columns
```
## Via NLTK
```python
import nltk
nltk.download("crubadan")
reader = nltk.corpus.crubadan
reader.lang_codes() # list all 449 codes
reader.trigrams("af") # Afrikaans trigrams
reader.iso_lang_code("af") # → 'afr'
reader.lang_name("af") # → 'Afrikaans'
```
## License
GPL v3 — © 2010 Kevin P. Scannell.
See [GNU GPL v3](https://www.gnu.org/licenses/gpl-3.0.html).
## Citation
```bibtex
@inproceedings{crubadan,
author = {Scannell, Kevin P.},
title = {The Crúbadán Project: Corpus building for under-resourced languages},
booktitle = {Building and Exploring Web Corpora: Proceedings of the 3rd Web as Corpus Workshop},
year = {2007},
pages = {5--15},
url = {http://borel.slu.edu/crubadan/}
}
```
提供机构:
nltk-data-hub



