five

Replication data for: Measure schematicity through information content: A quantitative approach to grammaticalization

收藏
DataCite Commons2026-01-05 更新2026-04-25 收录
下载链接:
https://dataverse.no/citation?persistentId=doi:10.18710/APTUHA
下载链接
链接失效反馈
官方服务:
资源简介:
This is a study to propose a quantitative method to compute the schematicity of constructions, which is a key indicator of the level of grammaticalization of morphemes. In this method, to estimate the schematicity of a schema made up of two morphemes, i.e., X_ (X is the target morpheme and _ represents an open slot), we need to know the total token frequency of all types of X_, and the token frequencies of all kinds of elements occurring in the open slot. For example, if we are interested in the schematicity of “_ment”. We need to know the total token frequency of “_ment”, which is the sum of the frequencies of “shipment”, “equipment”, “employment”, “appointment” … (all types of “_ment”). We also need to know the token frequencies of “ship”, “equip”, “employ”, “appoint” … (all types of elements occurring in the open slot). Therefore, the data are morpheme bigrams (2-gram) generated from the English and Chinese corpora showing what morphemes can each morpheme combine with, together with the token frequency of each bigram, and the token frequencies of its two components respectively.

本研究提出一种量化方法,用于计算构式(construction)的图式性(schematicity)——该指标是语素(morpheme)语法化(grammaticalization)程度的关键表征。在该方法框架下,为估算由两个语素构成的图式的图式性,即形如X_的图式(其中X为目标语素,_代表开放槽位),需明确两类词元(Token)频次:一是所有X_类构式的总词元频次,二是所有出现在开放槽位中的成分的词元频次。例如,若需测算"_ment"的图式性,需先获取"_ment"类构式的总词元频次,即"shipment"、"equipment"、"employment"、"appointment"等所有"_ment"类构式的频次之和;同时还需获取"ship"、"equip"、"employ"、"appoint"等所有出现在开放槽位中的成分的词元频次。据此,本数据集的样本为从英语与汉语语料库中提取的语素二元组(morpheme bigrams,即2元组),此类二元组可展示各语素的可搭配语素范围,同时附带每个二元组的词元频次,以及其两个组成部分各自的词元频次。
提供机构:
DataverseNO
创建时间:
2023-01-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作