Verbal derivational suffixes in Hungarian: -(s)Odik and -(s)Ul
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7607660
下载链接
链接失效反馈官方服务:
资源简介:
This is an open-source dataset containing more than 1.1 million corpus occurrences of the Hungarian verbal derivational suffixes -(s)Odik and -(s)Ul. Both suffixes are used to create intransitive verbs from nominal bases, and they both mean 'to become [adjective/noun]'. This dataset is suited for quantitative investigations into the subtle differences regarding how and when these suffixes are used. It consists of the following columns:
1 id: ID
2 form: lowercase word form
3 lemma: word without inflectional suffixes; if the verb has a (separated) preverb, there is a + sign between the preverb and the verb stem
4 prev: preverb associated with the verb
5 prevtype: PFX if the preverb is prefixed to the verb, SEP if the preverb is separated
6 verb: verb lemma; in each case without preverb
7 root: adjective or noun serving as the base of verb formation
8 suffix: derivational suffix: -ul/ül/sul/sül endings are represented by -(s)Ul, -odik/edik/ödik/sodik/sedik/södik endings are represented by -(s)Odik
9 w2v_cluster: the cluster ID of the root, based on word2vec embedding
10 argframe_cases: arguments of the verb, represented by case-endings
11 argframe_long: arguments of the verb, represented by lemma + case-ending combinations
12 doc_year: the year of writing or the year of publication, 0 if unknown
13 doc_style: document style
14 doc_id: document identifier
15 left_context: text preceding the hit
16 kwic: the hit
17 right_context: text following the hit
18 freqsum: token frequency of the verb lemma; occurrences with and without preverbs are counted together
19 prev_vs_all: token frequency of the verb lemma with any preverb, divided by the 'freqsum' value
20 actprev_vs_allprev: token frequency of the specific preverb + verb lemma combination, divided by the 'prev_vs_all' value
The first row stands for the header. If a cell's value is unspecified, it is marked with underscore (_).
创建时间:
2023-02-06



