Data and code from: Statistical structure and the evolution of languages
收藏DataCite Commons2026-04-10 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.pg4f4qs44
下载链接
链接失效反馈官方服务:
资源简介:
Human cultural development is marked by the emergence of new words and
ideas, reflecting societal changes. But how does this evolution proceed?
We use modern methods in natural language processing (namely, word
embeddings) to measure statistical traces of cultural development,
providing a testing ground to compare different models as to how this
process works. We show that real embeddings of English and 21 other
languages exhibit a series of previously unrecognized regularities,
specifically (a) frequency assortativity, where entities of high
popularity cluster near other high-popularity entities, (b) characteristic
clustering velocity profiles due to aggregation into hierarchical
structures, (c) persistent temporal dynamics, where newly-created entities
appear disproportionately near other recent entries, and (d) Taylor’s law,
implying that over time and across empirical semantic space the variance
in new entity counts scales as a power of the mean, which helps
systematize and quantify large historical fluctuations of neologisms. To
explain these facts, we propose a class of generative models
(specifically, directed preferential placement) that construct synthetic
embeddings exhibiting similar regularities. We show that analogous
regularities also occur in other data sets, suggesting that such
generating models may shed light on new aspects of language and cultural
evolution.
提供机构:
Dryad
创建时间:
2026-01-21



