five

Replication data for: "The crystallization of language over time"

收藏
DataCite Commons2025-10-01 更新2026-05-03 收录
下载链接:
https://rdr.kuleuven.be/citation?persistentId=doi:10.48804/YAMJUS
下载链接
链接失效反馈
官方服务:
资源简介:
This repository contains the data and R script accompanying the paper "The crystallization of language over time" (under review). The datasets are stored in txt (tab-delimited) and rds format in the /data folder. The files ngrams_lemma.txt and ngrams_pos.txt contain lemma and part-of-speech trigrams and frequency information culled from the C-CLAMP corpus (1850-1999; Piersoul et al. 2021). These files are used to calculate the trigrams' association measures with the R code in 01_data_preparation.R. The file 02_analyses.R contains the R code used to model the trigrams' internal coherence through time using generalized linear and additive mixed models, and assess their distribution by means of Shannon's entropy and Kullback-Leibler Divergence.
提供机构:
KU Leuven RDR
创建时间:
2025-07-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作