Replication data for: Measure schematicity through information content: A quantitative approach to grammaticalization
收藏DataCite Commons2026-01-05 更新2026-04-25 收录
下载链接:
https://dataverse.no/citation?persistentId=doi:10.18710/APTUHA
下载链接
链接失效反馈官方服务:
资源简介:
This is a study to propose a quantitative method to compute the schematicity of constructions, which is a key indicator of the level of grammaticalization of morphemes. In this method, to estimate the schematicity of a schema made up of two morphemes, i.e., X_ (X is the target morpheme and _ represents an open slot), we need to know the total token frequency of all types of X_, and the token frequencies of all kinds of elements occurring in the open slot. For example, if we are interested in the schematicity of “_ment”. We need to know the total token frequency of “_ment”, which is the sum of the frequencies of “shipment”, “equipment”, “employment”, “appointment” … (all types of “_ment”). We also need to know the token frequencies of “ship”, “equip”, “employ”, “appoint” … (all types of elements occurring in the open slot). Therefore, the data are morpheme bigrams (2-gram) generated from the English and Chinese corpora showing what morphemes can each morpheme combine with, together with the token frequency of each bigram, and the token frequencies of its two components respectively.
本研究提出一种量化方法,用于计算构式(construction)的图式性(schematicity)——该指标是语素(morpheme)语法化(grammaticalization)程度的关键表征。在该方法框架下,为估算由两个语素构成的图式的图式性,即形如X_的图式(其中X为目标语素,_代表开放槽位),需明确两类词元(Token)频次:一是所有X_类构式的总词元频次,二是所有出现在开放槽位中的成分的词元频次。例如,若需测算"_ment"的图式性,需先获取"_ment"类构式的总词元频次,即"shipment"、"equipment"、"employment"、"appointment"等所有"_ment"类构式的频次之和;同时还需获取"ship"、"equip"、"employ"、"appoint"等所有出现在开放槽位中的成分的词元频次。据此,本数据集的样本为从英语与汉语语料库中提取的语素二元组(morpheme bigrams,即2元组),此类二元组可展示各语素的可搭配语素范围,同时附带每个二元组的词元频次,以及其两个组成部分各自的词元频次。
提供机构:
DataverseNO
创建时间:
2023-01-27



