iahlt/he-universal_morphologies
收藏Hugging Face2024-03-04 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/iahlt/he-universal_morphologies
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: lemma
dtype: string
- name: forms
sequence:
- name: word
dtype: string
- name: Aktionsart
sequence:
class_label:
names:
'0': STAT
'1': DYN
'2': TEL
'3': ATEL
'4': PCT
'5': DUR
'6': ACH
'7': ACCMP
'8': SEMEL
'9': ACTY
- name: Animacy
sequence:
class_label:
names:
'0': ANIM
'1': INAN
'2': HUM
'3': NHUM
- name: Argument_Marking
sequence:
class_label:
names:
'0': ARGNO1S
'1': ARGNO2S
'2': ARGNO3S
'3': ARGNO1P
'4': ARGNO2P
'5': ARGNO3P
'6': ARGAC1S
'7': ARGAC2S
'8': ARGAC3S
'9': ARGAC1P
'10': ARGAC2P
'11': ARGAC3P
'12': ARGAB1S
'13': ARGAB2S
'14': ARGAB3S
'15': ARGAB1P
'16': ARGAB2P
'17': ARGAB3P
'18': ARGER1S
'19': ARGER2S
'20': ARGER3S
'21': ARGER1P
'22': ARGER2P
'23': ARGER3P
'24': ARGDA1S
'25': ARGDA2S
'26': ARGDA3S
'27': ARGDA1P
'28': ARGDA2P
'29': ARGDA3P
'30': ARGBE1S
'31': ARGBE2S
'32': ARGBE3S
'33': ARGBE1P
'34': ARGBE2P
'35': ARGBE3P
- name: Aspect
sequence:
class_label:
names:
'0': IPFV
'1': PFV
'2': PRF
'3': PROG
'4': PROSP
'5': ITER
'6': HAB
- name: Case
sequence:
class_label:
names:
'0': NOM
'1': ACC
'2': ERG
'3': ABS
'4': NOMS
'5': DAT
'6': BEN
'7': PRP
'8': GEN
'9': REL
'10': PRT
'11': INS
'12': COM
'13': VOC
'14': COMPV
'15': EQTV
'16': PRIV
'17': PROPR
'18': AVR
'19': FRML
'20': TRANS
'21': BYWAY
'22': INTER
'23': AT
'24': POST
'25': IN
'26': CIRC
'27': ANTE
'28': APUD
'29': 'ON'
'30': ONHR
'31': ONVR
'32': SUB
'33': REM
'34': PROXM
'35': ESS
'36': ALL
'37': ABL
'38': APPRX
'39': TERM
- name: Comparison
sequence:
class_label:
names:
'0': CMPR
'1': SPRL
'2': AB
'3': RL
'4': EQT
- name: Definiteness
sequence:
class_label:
names:
'0': DEF
'1': INDF
'2': SPEC
'3': NSPEC
- name: Deixis
sequence:
class_label:
names:
'0': PROX
'1': MED
'2': REMT
'3': REF1
'4': REF2
'5': NOREF
'6': PHOR
'7': VIS
'8': NVIS
'9': ABV
'10': EVEN
'11': BEL
- name: Evidentiality
sequence:
class_label:
names:
'0': FH
'1': DRCT
'2': SEN
'3': VISU
'4': NVSEN
'5': AUD
'6': NFH
'7': QUOT
'8': RPRT
'9': HRSY
'10': INFER
'11': ASSUM
- name: Finiteness
sequence:
class_label:
names:
'0': FIN
'1': NFIN
- name: Gender
sequence:
class_label:
names:
'0': MASC
'1': FEM
'2': NEUT
'3': NAKH1
'4': NAKH2
'5': NAKH3
'6': NAKH4
'7': NAKH5
'8': NAKH6
'9': NAKH7
'10': NAKH8
'11': BANTU1
'12': BANTU2
'13': BANTU3
'14': BANTU4
'15': BANTU5
'16': BANTU6
'17': BANTU7
'18': BANTU8
'19': BANTU9
'20': BANTU10
'21': BANTU11
'22': BANTU12
'23': BANTU13
'24': BANTU14
'25': BANTU15
'26': BANTU16
'27': BANTU17
'28': BANTU18
'29': BANTU19
'30': BANTU20
'31': BANTU21
'32': BANTU22
'33': BANTU23
- name: Information_Structure
sequence:
class_label:
names:
'0': TOP
'1': FOC
- name: Interrogativity
sequence:
class_label:
names:
'0': DECL
'1': INT
- name: Language_Specific
sequence:
class_label:
names:
'0': LGSPEC1
'1': LGSPEC2
'2': LGSPEC3
'3': LGSPEC4
'4': LGSPEC5
'5': LGSPEC6
'6': LGSPEC7
'7': LGSPEC8
'8': LGSPEC9
'9': LGSPEC10
- name: Mood
sequence:
class_label:
names:
'0': IND
'1': SBJV
'2': REAL
'3': IRR
'4': AUPRP
'5': AUNPRP
'6': IMP
'7': COND
'8': PURP
'9': INTEN
'10': POT
'11': LKLY
'12': ADM
'13': OBLIG
'14': DEB
'15': PERM
'16': DED
'17': SIM
'18': OPT
- name: Number
sequence:
class_label:
names:
'0': SG
'1': PL
'2': GRPL
'3': DU
'4': TRI
'5': PAUC
'6': GRPAUC
'7': INVN
- name: Part_Of_Speech
sequence:
class_label:
names:
'0': 'N'
'1': PROPN
'2': ADJ
'3': PRO
'4': CLF
'5': ART
'6': DET
'7': V
'8': ADV
'9': AUX
'10': V.PTCP
'11': V.MSDR
'12': V.CVB
'13': ADP
'14': COMP
'15': CONJ
'16': NUM
'17': PART
'18': INTJ
- name: Person
sequence:
class_label:
names:
'0': '0'
'1': '1'
'2': '2'
'3': '3'
'4': '4'
'5': INCL
'6': EXCL
'7': PRX
'8': OBV
- name: Polarity
sequence:
class_label:
names:
'0': POS
'1': NEG
- name: Politeness
sequence:
class_label:
names:
'0': INFM
'1': FORM
'2': ELEV
'3': HUMB
'4': POL
'5': AVOID
'6': LOW
'7': HIGH
'8': STELEV
'9': STSUPR
'10': LIT
'11': FOREG
'12': COL
- name: Possession
sequence:
class_label:
names:
'0': ALN
'1': NALN
'2': PSS1S
'3': PSS2S
'4': PSS2SF
'5': PSS2SM
'6': PSS2SINFM
'7': PSS2SFORM
'8': PSS3S
'9': PSS3SF
'10': PSS3SM
'11': PSS1D
'12': PSS1DI
'13': PSS1DE
'14': PSS2D
'15': PSS2DM
'16': PSS2DF
'17': PSS3D
'18': PSS3DF
'19': PSS3DM
'20': PSS1P
'21': PSS1PI
'22': PSS1PE
'23': PSS2P
'24': PSS2PF
'25': PSS2PM
'26': PSS3PF
'27': PSS3PM
- name: Switch_Reference
sequence:
class_label:
names:
'0': SS
'1': SSADV
'2': DS
'3': DSADV
'4': OR
'5': SIMMA
'6': SEQMA
'7': LOG
- name: Tense
sequence:
class_label:
names:
'0': PRS
'1': PST
'2': FUT
'3': IMMED
'4': HOD
'5': 1DAY
'6': RCT
'7': RMT
- name: Valency
sequence:
class_label:
names:
'0': IMPRS
'1': INTR
'2': TR
'3': DITR
'4': REFL
'5': RECP
'6': CAUS
'7': APPL
- name: Voice
sequence:
class_label:
names:
'0': ACT
'1': MID
'2': PASS
'3': ANTIP
'4': DIR
'5': INV
'6': AGFOC
'7': PFOC
'8': LFOC
'9': BFOC
'10': ACFOC
'11': IFOC
'12': CFOC
- name: Other
sequence: string
splits:
- name: train
num_bytes: 5258481
num_examples: 1176
download_size: 302856
dataset_size: 5258481
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
language:
- he
---
提供机构:
iahlt
原始信息汇总
数据集概述
数据特征
数据集包含以下特征:
- lemma: 类型为字符串。
- forms: 包含以下序列:
- word: 类型为字符串。
- Aktionsart: 包含以下类别标签:
- 0: STAT
- 1: DYN
- 2: TEL
- 3: ATEL
- 4: PCT
- 5: DUR
- 6: ACH
- 7: ACCMP
- 8: SEMEL
- 9: ACTY
- Animacy: 包含以下类别标签:
- 0: ANIM
- 1: INAN
- 2: HUM
- 3: NHUM
- Argument_Marking: 包含以下类别标签:
- 0: ARGNO1S
- 1: ARGNO2S
- 2: ARGNO3S
- 3: ARGNO1P
- 4: ARGNO2P
- 5: ARGNO3P
- 6: ARGAC1S
- 7: ARGAC2S
- 8: ARGAC3S
- 9: ARGAC1P
- 10: ARGAC2P
- 11: ARGAC3P
- 12: ARGAB1S
- 13: ARGAB2S
- 14: ARGAB3S
- 15: ARGAB1P
- 16: ARGAB2P
- 17: ARGAB3P
- 18: ARGER1S
- 19: ARGER2S
- 20: ARGER3S
- 21: ARGER1P
- 22: ARGER2P
- 23: ARGER3P
- 24: ARGDA1S
- 25: ARGDA2S
- 26: ARGDA3S
- 27: ARGDA1P
- 28: ARGDA2P
- 29: ARGDA3P
- 30: ARGBE1S
- 31: ARGBE2S
- 32: ARGBE3S
- 33: ARGBE1P
- 34: ARGBE2P
- 35: ARGBE3P
- Aspect: 包含以下类别标签:
- 0: IPFV
- 1: PFV
- 2: PRF
- 3: PROG
- 4: PROSP
- 5: ITER
- 6: HAB
- Case: 包含以下类别标签:
- 0: NOM
- 1: ACC
- 2: ERG
- 3: ABS
- 4: NOMS
- 5: DAT
- 6: BEN
- 7: PRP
- 8: GEN
- 9: REL
- 10: PRT
- 11: INS
- 12: COM
- 13: VOC
- 14: COMPV
- 15: EQTV
- 16: PRIV
- 17: PROPR
- 18: AVR
- 19: FRML
- 20: TRANS
- 21: BYWAY
- 22: INTER
- 23: AT
- 24: POST
- 25: IN
- 26: CIRC
- 27: ANTE
- 28: APUD
- 29: ON
- 30: ONHR
- 31: ONVR
- 32: SUB
- 33: REM
- 34: PROXM
- 35: ESS
- 36: ALL
- 37: ABL
- 38: APPRX
- 39: TERM
- Comparison: 包含以下类别标签:
- 0: CMPR
- 1: SPRL
- 2: AB
- 3: RL
- 4: EQT
- Definiteness: 包含以下类别标签:
- 0: DEF
- 1: INDF
- 2: SPEC
- 3: NSPEC
- Deixis: 包含以下类别标签:
- 0: PROX
- 1: MED
- 2: REMT
- 3: REF1
- 4: REF2
- 5: NOREF
- 6: PHOR
- 7: VIS
- 8: NVIS
- 9: ABV
- 10: EVEN
- 11: BEL
- Evidentiality: 包含以下类别标签:
- 0: FH
- 1: DRCT
- 2: SEN
- 3: VISU
- 4: NVSEN
- 5: AUD
- 6: NFH
- 7: QUOT
- 8: RPRT
- 9: HRSY
- 10: INFER
- 11: ASSUM
- Finiteness: 包含以下类别标签:
- 0: FIN
- 1: NFIN
- Gender: 包含以下类别标签:
- 0: MASC
- 1: FEM
- 2: NEUT
- 3: NAKH1
- 4: NAKH2
- 5: NAKH3
- 6: NAKH4
- 7: NAKH5
- 8: NAKH6
- 9: NAKH7
- 10: NAKH8
- 11: BANTU1
- 12: BANTU2
- 13: BANTU3
- 14: BANTU4
- 15: BANTU5
- 16: BANTU6
- 17: BANTU7
- 18: BANTU8
- 19: BANTU9
- 20: BANTU10
- 21: BANTU11
- 22: BANTU12
- 23: BANTU13
- 24: BANTU14
- 25: BANTU15
- 26: BANTU16
- 27: BANTU17
- 28: BANTU18
- 29: BANTU19
- 30: BANTU20
- 31: BANTU21
- 32: BANTU22
- 33: BANTU23
- Information_Structure: 包含以下类别标签:
- 0: TOP
- 1: FOC
- Interrogativity: 包含以下类别标签:
- 0: DECL
- 1: INT
- Language_Specific: 包含以下类别标签:
- 0: LGSPEC1
- 1: LGSPEC2
- 2: LGSPEC3
- 3: LGSPEC4
- 4: LGSPEC5
- 5: LGSPEC6
- 6: LGSPEC7
- 7: LGSPEC8
- 8: LGSPEC9
- 9: LGSPEC10
- Mood: 包含以下类别标签:
- 0: IND
- 1: SBJV
- 2: REAL
- 3: IRR
- 4: AUPRP
- 5: AUNPRP
- 6: IMP
- 7: COND
- 8: PURP
- 9: INTEN
- 10: POT
- 11: LKLY
- 12: ADM
- 13: OBLIG
- 14: DEB
- 15: PERM
- 16: DED
- 17: SIM
- 18: OPT
- Number: 包含以下类别标签:
- 0: SG
- 1: PL
- 2: GRPL
- 3: DU
- 4: TRI
- 5: PAUC
- 6: GRPAUC
- 7: INVN
- Part_Of_Speech: 包含以下类别标签:
- 0: N
- 1: PROPN
- 2: ADJ
- 3: PRO
- 4: CLF
- 5: ART
- 6: DET
- 7: V
- 8: ADV
- 9: AUX
- 10: V.PTCP
- 11: V.MSDR
- 12: V.CVB
- 13: ADP
- 14: COMP
- 15: CONJ
- 16: NUM
- 17: PART
- 18: INTJ
- Person: 包含以下类别标签:
- 0: 0
- 1: 1
- 2: 2
- 3: 3
- 4: 4
- 5: INCL
- 6: EXCL
- 7: PRX
- 8: OBV
- Polarity: 包含以下类别标签:
- 0: POS
- 1: NEG
- Politeness: 包含以下类别标签:
- 0: INFM
- 1: FORM
- 2: ELEV
- 3: HUMB
- 4: POL
- 5: AVOID
- 6: LOW
- 7: HIGH
- 8: STELEV
- 9: STSUPR
- 10: LIT
- 11: FOREG
- 12: COL
- Possession: 包含以下类别标签:
- 0: ALN
- 1: NALN
- 2: PSS1S
- 3: PSS2S
- 4: PSS2SF
- 5: PSS2SM
- 6: PSS2SINFM
- 7: PSS2SFORM
- 8: PSS3S
- 9: PSS3SF
- 10: PSS3SM
- 11: PSS1D
- 12: PSS1DI
- 13: PSS1DE
- 14: PSS2D
- 15: PSS2DM
- 16: PSS2DF
- 17: PSS3D
- 18: PSS3DF
- 19: PSS3DM
- 20: PSS1P
- 21: PSS1PI
- 22: PSS1PE
- 23: PSS2P
- 24: PSS2PF
- 25: PSS2PM
- 26: PSS3PF
- 27: PSS3PM
- Switch_Reference: 包含以下类别标签:
- 0: SS
- 1: SSADV
- 2: DS
- 3: DSADV
- 4: OR
- 5: SIMMA
- 6: SEQMA
- 7: LOG
- Tense: 包含以下类别标签:
- 0: PRS
- 1: PST
- 2: FUT
- 3: IMMED
- 4: HOD
- 5: 1DAY
- 6: RCT
- 7: RMT
- Valency: 包含以下类别标签:
- 0: IMPRS
- 1: INTR
- 2: TR
- 3: DITR
- 4: REFL
- 5: RECP
- 6: CAUS
- 7: APPL
- Voice: 包含以下类别标签:
- 0: ACT
- 1: MID
- 2: PASS
- 3: ANTIP
- 4: DIR
- 5: INV
- 6: AGFOC
- 7: PFOC
- 8: LFOC
- 9: BFOC
- 10: ACFOC
- 11: IFOC
- 12: CFOC
- Other: 类型为字符串。
数据分割
数据集包含以下分割:
- train: 包含1176个样本,占用5258481字节。
数据集大小
- 下载大小: 302856字节
- 数据集大小: 5258481字节
配置
- config_name: default
- data_files:
- split: train
- path: data/train-*
- split: train
- data_files:
语言
- 数据集支持的语言: 希伯来语 (he)



