five

iahlt/ar-universal_morphologies

收藏
Hugging Face2024-03-04 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/iahlt/ar-universal_morphologies
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: lemma dtype: string - name: forms sequence: - name: word dtype: string - name: Aktionsart sequence: class_label: names: '0': STAT '1': DYN '2': TEL '3': ATEL '4': PCT '5': DUR '6': ACH '7': ACCMP '8': SEMEL '9': ACTY - name: Animacy sequence: class_label: names: '0': ANIM '1': INAN '2': HUM '3': NHUM - name: Argument_Marking sequence: class_label: names: '0': ARGNO1S '1': ARGNO2S '2': ARGNO3S '3': ARGNO1P '4': ARGNO2P '5': ARGNO3P '6': ARGAC1S '7': ARGAC2S '8': ARGAC3S '9': ARGAC1P '10': ARGAC2P '11': ARGAC3P '12': ARGAB1S '13': ARGAB2S '14': ARGAB3S '15': ARGAB1P '16': ARGAB2P '17': ARGAB3P '18': ARGER1S '19': ARGER2S '20': ARGER3S '21': ARGER1P '22': ARGER2P '23': ARGER3P '24': ARGDA1S '25': ARGDA2S '26': ARGDA3S '27': ARGDA1P '28': ARGDA2P '29': ARGDA3P '30': ARGBE1S '31': ARGBE2S '32': ARGBE3S '33': ARGBE1P '34': ARGBE2P '35': ARGBE3P - name: Aspect sequence: class_label: names: '0': IPFV '1': PFV '2': PRF '3': PROG '4': PROSP '5': ITER '6': HAB - name: Case sequence: class_label: names: '0': NOM '1': ACC '2': ERG '3': ABS '4': NOMS '5': DAT '6': BEN '7': PRP '8': GEN '9': REL '10': PRT '11': INS '12': COM '13': VOC '14': COMPV '15': EQTV '16': PRIV '17': PROPR '18': AVR '19': FRML '20': TRANS '21': BYWAY '22': INTER '23': AT '24': POST '25': IN '26': CIRC '27': ANTE '28': APUD '29': 'ON' '30': ONHR '31': ONVR '32': SUB '33': REM '34': PROXM '35': ESS '36': ALL '37': ABL '38': APPRX '39': TERM - name: Comparison sequence: class_label: names: '0': CMPR '1': SPRL '2': AB '3': RL '4': EQT - name: Definiteness sequence: class_label: names: '0': DEF '1': INDF '2': SPEC '3': NSPEC - name: Deixis sequence: class_label: names: '0': PROX '1': MED '2': REMT '3': REF1 '4': REF2 '5': NOREF '6': PHOR '7': VIS '8': NVIS '9': ABV '10': EVEN '11': BEL - name: Evidentiality sequence: class_label: names: '0': FH '1': DRCT '2': SEN '3': VISU '4': NVSEN '5': AUD '6': NFH '7': QUOT '8': RPRT '9': HRSY '10': INFER '11': ASSUM - name: Finiteness sequence: class_label: names: '0': FIN '1': NFIN - name: Gender sequence: class_label: names: '0': MASC '1': FEM '2': NEUT '3': NAKH1 '4': NAKH2 '5': NAKH3 '6': NAKH4 '7': NAKH5 '8': NAKH6 '9': NAKH7 '10': NAKH8 '11': BANTU1 '12': BANTU2 '13': BANTU3 '14': BANTU4 '15': BANTU5 '16': BANTU6 '17': BANTU7 '18': BANTU8 '19': BANTU9 '20': BANTU10 '21': BANTU11 '22': BANTU12 '23': BANTU13 '24': BANTU14 '25': BANTU15 '26': BANTU16 '27': BANTU17 '28': BANTU18 '29': BANTU19 '30': BANTU20 '31': BANTU21 '32': BANTU22 '33': BANTU23 - name: Information_Structure sequence: class_label: names: '0': TOP '1': FOC - name: Interrogativity sequence: class_label: names: '0': DECL '1': INT - name: Language_Specific sequence: class_label: names: '0': LGSPEC1 '1': LGSPEC2 '2': LGSPEC3 '3': LGSPEC4 '4': LGSPEC5 '5': LGSPEC6 '6': LGSPEC7 '7': LGSPEC8 '8': LGSPEC9 '9': LGSPEC10 - name: Mood sequence: class_label: names: '0': IND '1': SBJV '2': REAL '3': IRR '4': AUPRP '5': AUNPRP '6': IMP '7': COND '8': PURP '9': INTEN '10': POT '11': LKLY '12': ADM '13': OBLIG '14': DEB '15': PERM '16': DED '17': SIM '18': OPT - name: Number sequence: class_label: names: '0': SG '1': PL '2': GRPL '3': DU '4': TRI '5': PAUC '6': GRPAUC '7': INVN - name: Part_Of_Speech sequence: class_label: names: '0': 'N' '1': PROPN '2': ADJ '3': PRO '4': CLF '5': ART '6': DET '7': V '8': ADV '9': AUX '10': V.PTCP '11': V.MSDR '12': V.CVB '13': ADP '14': COMP '15': CONJ '16': NUM '17': PART '18': INTJ - name: Person sequence: class_label: names: '0': '0' '1': '1' '2': '2' '3': '3' '4': '4' '5': INCL '6': EXCL '7': PRX '8': OBV - name: Polarity sequence: class_label: names: '0': POS '1': NEG - name: Politeness sequence: class_label: names: '0': INFM '1': FORM '2': ELEV '3': HUMB '4': POL '5': AVOID '6': LOW '7': HIGH '8': STELEV '9': STSUPR '10': LIT '11': FOREG '12': COL - name: Possession sequence: class_label: names: '0': ALN '1': NALN '2': PSS1S '3': PSS2S '4': PSS2SF '5': PSS2SM '6': PSS2SINFM '7': PSS2SFORM '8': PSS3S '9': PSS3SF '10': PSS3SM '11': PSS1D '12': PSS1DI '13': PSS1DE '14': PSS2D '15': PSS2DM '16': PSS2DF '17': PSS3D '18': PSS3DF '19': PSS3DM '20': PSS1P '21': PSS1PI '22': PSS1PE '23': PSS2P '24': PSS2PF '25': PSS2PM '26': PSS3PF '27': PSS3PM - name: Switch_Reference sequence: class_label: names: '0': SS '1': SSADV '2': DS '3': DSADV '4': OR '5': SIMMA '6': SEQMA '7': LOG - name: Tense sequence: class_label: names: '0': PRS '1': PST '2': FUT '3': IMMED '4': HOD '5': 1DAY '6': RCT '7': RMT - name: Valency sequence: class_label: names: '0': IMPRS '1': INTR '2': TR '3': DITR '4': REFL '5': RECP '6': CAUS '7': APPL - name: Voice sequence: class_label: names: '0': ACT '1': MID '2': PASS '3': ANTIP '4': DIR '5': INV '6': AGFOC '7': PFOC '8': LFOC '9': BFOC '10': ACFOC '11': IFOC '12': CFOC - name: Other sequence: string splits: - name: train num_bytes: 24384743 num_examples: 4134 download_size: 1318849 dataset_size: 24384743 configs: - config_name: default data_files: - split: train path: data/train-* language: - he ---
提供机构:
iahlt
原始信息汇总

数据集概述

数据特征

数据集包含以下特征:

  • lemma: 字符串类型
  • forms: 序列类型,包含以下子特征:
    • word: 字符串类型
    • Aktionsart: 序列类型,包含以下类别标签:
      • 0: STAT
      • 1: DYN
      • 2: TEL
      • 3: ATEL
      • 4: PCT
      • 5: DUR
      • 6: ACH
      • 7: ACCMP
      • 8: SEMEL
      • 9: ACTY
    • Animacy: 序列类型,包含以下类别标签:
      • 0: ANIM
      • 1: INAN
      • 2: HUM
      • 3: NHUM
    • Argument_Marking: 序列类型,包含以下类别标签:
      • 0: ARGNO1S
      • 1: ARGNO2S
      • 2: ARGNO3S
      • 3: ARGNO1P
      • 4: ARGNO2P
      • 5: ARGNO3P
      • 6: ARGAC1S
      • 7: ARGAC2S
      • 8: ARGAC3S
      • 9: ARGAC1P
      • 10: ARGAC2P
      • 11: ARGAC3P
      • 12: ARGAB1S
      • 13: ARGAB2S
      • 14: ARGAB3S
      • 15: ARGAB1P
      • 16: ARGAB2P
      • 17: ARGAB3P
      • 18: ARGER1S
      • 19: ARGER2S
      • 20: ARGER3S
      • 21: ARGER1P
      • 22: ARGER2P
      • 23: ARGER3P
      • 24: ARGDA1S
      • 25: ARGDA2S
      • 26: ARGDA3S
      • 27: ARGDA1P
      • 28: ARGDA2P
      • 29: ARGDA3P
      • 30: ARGBE1S
      • 31: ARGBE2S
      • 32: ARGBE3S
      • 33: ARGBE1P
      • 34: ARGBE2P
      • 35: ARGBE3P
    • Aspect: 序列类型,包含以下类别标签:
      • 0: IPFV
      • 1: PFV
      • 2: PRF
      • 3: PROG
      • 4: PROSP
      • 5: ITER
      • 6: HAB
    • Case: 序列类型,包含以下类别标签:
      • 0: NOM
      • 1: ACC
      • 2: ERG
      • 3: ABS
      • 4: NOMS
      • 5: DAT
      • 6: BEN
      • 7: PRP
      • 8: GEN
      • 9: REL
      • 10: PRT
      • 11: INS
      • 12: COM
      • 13: VOC
      • 14: COMPV
      • 15: EQTV
      • 16: PRIV
      • 17: PROPR
      • 18: AVR
      • 19: FRML
      • 20: TRANS
      • 21: BYWAY
      • 22: INTER
      • 23: AT
      • 24: POST
      • 25: IN
      • 26: CIRC
      • 27: ANTE
      • 28: APUD
      • 29: ON
      • 30: ONHR
      • 31: ONVR
      • 32: SUB
      • 33: REM
      • 34: PROXM
      • 35: ESS
      • 36: ALL
      • 37: ABL
      • 38: APPRX
      • 39: TERM
    • Comparison: 序列类型,包含以下类别标签:
      • 0: CMPR
      • 1: SPRL
      • 2: AB
      • 3: RL
      • 4: EQT
    • Definiteness: 序列类型,包含以下类别标签:
      • 0: DEF
      • 1: INDF
      • 2: SPEC
      • 3: NSPEC
    • Deixis: 序列类型,包含以下类别标签:
      • 0: PROX
      • 1: MED
      • 2: REMT
      • 3: REF1
      • 4: REF2
      • 5: NOREF
      • 6: PHOR
      • 7: VIS
      • 8: NVIS
      • 9: ABV
      • 10: EVEN
      • 11: BEL
    • Evidentiality: 序列类型,包含以下类别标签:
      • 0: FH
      • 1: DRCT
      • 2: SEN
      • 3: VISU
      • 4: NVSEN
      • 5: AUD
      • 6: NFH
      • 7: QUOT
      • 8: RPRT
      • 9: HRSY
      • 10: INFER
      • 11: ASSUM
    • Finiteness: 序列类型,包含以下类别标签:
      • 0: FIN
      • 1: NFIN
    • Gender: 序列类型,包含以下类别标签:
      • 0: MASC
      • 1: FEM
      • 2: NEUT
      • 3: NAKH1
      • 4: NAKH2
      • 5: NAKH3
      • 6: NAKH4
      • 7: NAKH5
      • 8: NAKH6
      • 9: NAKH7
      • 10: NAKH8
      • 11: BANTU1
      • 12: BANTU2
      • 13: BANTU3
      • 14: BANTU4
      • 15: BANTU5
      • 16: BANTU6
      • 17: BANTU7
      • 18: BANTU8
      • 19: BANTU9
      • 20: BANTU10
      • 21: BANTU11
      • 22: BANTU12
      • 23: BANTU13
      • 24: BANTU14
      • 25: BANTU15
      • 26: BANTU16
      • 27: BANTU17
      • 28: BANTU18
      • 29: BANTU19
      • 30: BANTU20
      • 31: BANTU21
      • 32: BANTU22
      • 33: BANTU23
    • Information_Structure: 序列类型,包含以下类别标签:
      • 0: TOP
      • 1: FOC
    • Interrogativity: 序列类型,包含以下类别标签:
      • 0: DECL
      • 1: INT
    • Language_Specific: 序列类型,包含以下类别标签:
      • 0: LGSPEC1
      • 1: LGSPEC2
      • 2: LGSPEC3
      • 3: LGSPEC4
      • 4: LGSPEC5
      • 5: LGSPEC6
      • 6: LGSPEC7
      • 7: LGSPEC8
      • 8: LGSPEC9
      • 9: LGSPEC10
    • Mood: 序列类型,包含以下类别标签:
      • 0: IND
      • 1: SBJV
      • 2: REAL
      • 3: IRR
      • 4: AUPRP
      • 5: AUNPRP
      • 6: IMP
      • 7: COND
      • 8: PURP
      • 9: INTEN
      • 10: POT
      • 11: LKLY
      • 12: ADM
      • 13: OBLIG
      • 14: DEB
      • 15: PERM
      • 16: DED
      • 17: SIM
      • 18: OPT
    • Number: 序列类型,包含以下类别标签:
      • 0: SG
      • 1: PL
      • 2: GRPL
      • 3: DU
      • 4: TRI
      • 5: PAUC
      • 6: GRPAUC
      • 7: INVN
    • Part_Of_Speech: 序列类型,包含以下类别标签:
      • 0: N
      • 1: PROPN
      • 2: ADJ
      • 3: PRO
      • 4: CLF
      • 5: ART
      • 6: DET
      • 7: V
      • 8: ADV
      • 9: AUX
      • 10: V.PTCP
      • 11: V.MSDR
      • 12: V.CVB
      • 13: ADP
      • 14: COMP
      • 15: CONJ
      • 16: NUM
      • 17: PART
      • 18: INTJ
    • Person: 序列类型,包含以下类别标签:
      • 0: 0
      • 1: 1
      • 2: 2
      • 3: 3
      • 4: 4
      • 5: INCL
      • 6: EXCL
      • 7: PRX
      • 8: OBV
    • Polarity: 序列类型,包含以下类别标签:
      • 0: POS
      • 1: NEG
    • Politeness: 序列类型,包含以下类别标签:
      • 0: INFM
      • 1: FORM
      • 2: ELEV
      • 3: HUMB
      • 4: POL
      • 5: AVOID
      • 6: LOW
      • 7: HIGH
      • 8: STELEV
      • 9: STSUPR
      • 10: LIT
      • 11: FOREG
      • 12: COL
    • Possession: 序列类型,包含以下类别标签:
      • 0: ALN
      • 1: NALN
      • 2: PSS1S
      • 3: PSS2S
      • 4: PSS2SF
      • 5: PSS2SM
      • 6: PSS2SINFM
      • 7: PSS2SFORM
      • 8: PSS3S
      • 9: PSS3SF
      • 10: PSS3SM
      • 11: PSS1D
      • 12: PSS1DI
      • 13: PSS1DE
      • 14: PSS2D
      • 15: PSS2DM
      • 16: PSS2DF
      • 17: PSS3D
      • 18: PSS3DF
      • 19: PSS3DM
      • 20: PSS1P
      • 21: PSS1PI
      • 22: PSS1PE
      • 23: PSS2P
      • 24: PSS2PF
      • 25: PSS2PM
      • 26: PSS3PF
      • 27: PSS3PM
    • Switch_Reference: 序列类型,包含以下类别标签:
      • 0: SS
      • 1: SSADV
      • 2: DS
      • 3: DSADV
      • 4: OR
      • 5: SIMMA
      • 6: SEQMA
      • 7: LOG
    • Tense: 序列类型,包含以下类别标签:
      • 0: PRS
      • 1: PST
      • 2: FUT
      • 3: IMMED
      • 4: HOD
      • 5: 1DAY
      • 6: RCT
      • 7: RMT
    • Valency: 序列类型,包含以下类别标签:
      • 0: IMPRS
      • 1: INTR
      • 2: TR
      • 3: DITR
      • 4: REFL
      • 5: RECP
      • 6: CAUS
      • 7: APPL
    • Voice: 序列类型,包含以下类别标签:
      • 0: ACT
      • 1: MID
      • 2: PASS
      • 3: ANTIP
      • 4: DIR
      • 5: INV
      • 6: AGFOC
      • 7: PFOC
      • 8: LFOC
      • 9: BFOC
      • 10: ACFOC
      • 11: IFOC
      • 12: CFOC
    • Other: 字符串类型

数据分割

  • train: 包含4134个样本,占用24384743字节

数据集大小

  • 下载大小: 1318849字节
  • 数据集大小: 24384743字节

配置

  • default: 包含训练数据文件,路径为data/train-*

语言

  • 数据集语言: 希伯来语 (he)
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作