Data and code from: Statistical structure and the evolution of languages
收藏DataONE2026-01-21 更新2026-02-07 收录
下载链接:
https://search.dataone.org/view/sha256:e0994b2e70af23a49ff371b0611e6f92d6f27a321a1e623d443f3ef3ee2e3223
下载链接
链接失效反馈官方服务:
资源简介:
Human cultural development is marked by the emergence of new words and ideas, reflecting societal changes. But how does this evolution proceed? We use modern methods in natural language processing (namely, word embeddings) to measure statistical traces of cultural development, providing a testing ground to compare different models as to how this process works. We show that real embeddings of English and 21 other languages exhibit a series of previously unrecognized regularities, specifically (a) frequency assortativity, where entities of high popularity cluster near other high-popularity entities, (b) characteristic clustering velocity profiles due to aggregation into hierarchical structures, (c) persistent temporal dynamics, where newly-created entities appear disproportionately near other recent entries, and (d) Taylorâs law, implying that over time and across empirical semantic space the variance in new entity counts scales as a power of the mean, which helps systematize and quantify..., , # Embedding Analysis
## Key Functions Overview
This repository contains key functions for generating and analyzing embedding models:
## To reproduce Figure/Tables
To reproduce all figures/tables: go to `./src/reproduce-results.ipynb` and run all cells sequentially.
All the figures and tables can be reproduced from the cache result files in `./data`
### Core Model Generation (`src/gen_models.py`)
* **`gen_model_gaussian()`** - Standard Gaussian embedding generation
* **`gen_model_mixture_gaussian()`** - Mixture Gaussian model with clustering using make_blobs
* **`gen_model_uniform()`** - Uniform distribution embedding generation
* **`gen_model_uniform_directional()`** - Directional preferential placement with spherical coordinates
* **`gen_model_preferential_placement_v2()`** - Preferential placement model with exponential radius
* **`gen_parameterized_preferential_placement_v2()`** - Parameterized preferential placement with VMF sampling and multiple options
* **`gen_model_prefere...,
创建时间:
2026-01-22



