Domestic and International Common Language (DICL) Database
收藏DataCite Commons2024-12-05 更新2025-04-15 收录
下载链接:
https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/8WGJTL
下载链接
链接失效反馈官方服务:
资源简介:
The database contains 11 index measures of linguistic similarity between 242 countries, both domestically and internationally. The domestic measures capture linguistic similarities present among populations within a single country while the international indexes capture language similarities between two different countries. The indexes, which are based on 6,674 languages, reflect three different dimensions of language: common official languages, common native and acquired spoken languages, and linguistic proximity across different languages. This database has many uses, such as in the study of bilateral flows—including FDI, migration, and international trade—as well as in regional or country level analyses.
<br><br>
<b>Version history:</b>
<br><br>
<b>Version 2 (Dec. 2024):</b> Version 2 of the Dataset added three additional indices (BPN, BPA, and BPS). It also corrected an issue with the calculation of the linguistic proximity indices in version 1; a small number of languages that terminated at the same point on a linguistic tree were unintentionally treated as being the same language and were omitted from the LPN, LPA, and LPS calculations. This omission affected relatively few indices overall and very few indices significantly. However, a small number of linguistic proximity indices are substantially larger after the correction. Finally, version 2 also reintroduced one language that had been inadvertently omitted in version 1, resulting in a small change to the index values for several countries, primarily in East and Southeast Asia.
<br><br>
<b>Version 1 (Mar. 2024):</b> Initial release on Dataverse.
本数据库涵盖242个国家在国内及跨国场景下的11项语言相似度指数指标。其中,国内维度指标用于衡量单个国家境内不同人口群体间的语言相似度,国际维度指标则用于测算两个不同国家之间的语言相似度。该系列指数基于6674种语言构建,覆盖语言的三类核心维度:通用官方语言、通用母语与习得口语,以及不同语言间的语言亲疏度。本数据库可应用于诸多研究场景,例如包括外商直接投资(Foreign Direct Investment, FDI)、人口迁移与国际贸易在内的双边流动研究,以及区域或国家层面的相关分析。
<b>版本沿革:</b>
<b>版本2(2024年12月):</b>本数据集的版本2新增了三项指数(BPN、BPA与BPS),同时修正了版本1中语言亲疏度指数的计算缺陷:此前存在少量在语言谱系树上终止于同一分支的语言被误判为同种语言,并被排除在LPN、LPA与LPS的计算之外。该疏漏整体上仅影响极少量指数,且对多数指数的影响微弱,但经修正后,少量语言亲疏度指数的数值出现了显著提升。此外,版本2还重新纳入了版本1中被意外遗漏的一种语言,使得部分国家(主要为东亚与东南亚地区)的指数数值发生了小幅变动。
<b>版本1(2024年3月):</b>首次在Dataverse平台发布。
提供机构:
Harvard Dataverse
创建时间:
2024-03-15
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集提供了242个国家间及内部的11种语言相似性指标,基于6,674种语言构建,涵盖官方语言、母语与习得语言、语言接近度三个维度,适用于研究国际贸易、移民等双边流动问题。
以上内容由遇见数据集搜集并总结生成



