five

Verbal derivational suffixes in Hungarian: -(s)Odik and -(s)Ul

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7607660
下载链接
链接失效反馈
官方服务:
资源简介:
This is an open-source dataset containing more than 1.1 million corpus occurrences of the Hungarian verbal derivational suffixes -(s)Odik and -(s)Ul. Both suffixes are used to create intransitive verbs from nominal bases, and they both mean 'to become [adjective/noun]'. This dataset is suited for quantitative investigations into the subtle differences regarding how and when these suffixes are used. It consists of the following columns: 1 id: ID 2 form: lowercase word form 3 lemma: word without inflectional suffixes; if the verb has a (separated) preverb, there is a + sign between the preverb and the verb stem 4 prev: preverb associated with the verb 5 prevtype: PFX if the preverb is prefixed to the verb, SEP if the preverb is separated 6 verb: verb lemma; in each case without preverb 7 root: adjective or noun serving as the base of verb formation 8 suffix: derivational suffix: -ul/ül/sul/sül endings are represented by -(s)Ul, -odik/edik/ödik/sodik/sedik/södik endings are represented by -(s)Odik 9 w2v_cluster: the cluster ID of the root, based on word2vec embedding 10 argframe_cases: arguments of the verb, represented by case-endings 11 argframe_long: arguments of the verb, represented by lemma + case-ending combinations 12 doc_year: the year of writing or the year of publication, 0 if unknown 13 doc_style: document style 14 doc_id: document identifier 15 left_context: text preceding the hit 16 kwic: the hit 17 right_context: text following the hit 18 freqsum: token frequency of the verb lemma; occurrences with and without preverbs are counted together 19 prev_vs_all: token frequency of the verb lemma with any preverb, divided by the 'freqsum' value 20 actprev_vs_allprev: token frequency of the specific preverb + verb lemma combination, divided by the 'prev_vs_all' value The first row stands for the header. If a cell's value is unspecified, it is marked with underscore (_).
创建时间:
2023-02-06
二维码
社区交流群
二维码
科研交流群
商业服务