WikiLingDiv
收藏DataCite Commons2026-05-07 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.18526765
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is a resource developed to support research on digital linguistic diversity. Quantitative research in the field is limited by the lack of openly available spatiotemporal data to quantify digital language use. While there are multiple dimensions of digital linguistic diversity, such as production and presence, this dataset focuses specifically on knowledge retrieval proxied by page views across different language editions of Wikipedia. As such, the dataset provides a unique opportunity to study the digital consumption of language.
To construct the dataset, the Wikimedia API was queried for the number of page views to each language edition for a given year and country from 2015 to 2024.
The data was enriched with ISO639-3 language codes and structured and merged into a single CSV: country_year_lang_views.csv. Based on this, we precompute three commonly used measures of diversity: Richness, Exponent-Shannon, and Inverse-Simpson for each country and year, yielding the timeseries diversity_measures.csv. Furthermore, additional data from Glottolog and rnaturalearthdata concerning the countries and languages included in the dataset are provided in country_data.csv and language_data.csv to facilitate analysis.
The data can be explored interactively at: https://f39e09-hannes-essfors.shinyapps.io/wikilingdiv_dashboard/
The paper presenting the data can be found at: https://aclanthology.org/2026.latechclfl-1.19/
This research was funded by WWTF (grant numberICT23-012). It is a part of the DIGILINGDIV-project.
提供机构:
Zenodo
创建时间:
2026-02-08



