Data Comparing phonetic and orthographic networks: A multiplex analysis
收藏DataCite Commons2020-11-24 更新2024-07-28 收录
下载链接:
https://figshare.com/articles/dataset/Table_Comparing_phonetic_and_orthographic_networks_a_multiplex_analysis_pdf/12735380/4
下载链接
链接失效反馈官方服务:
资源简介:
The complexity of natural language can be explored by means of multiplex analyses at different scales, from single words to groups of words or sentence levels. Here, we plan to investigate a multiplex word-level network, which comprises an orthographic and a phonological network defined in terms of distance similarity. We systematically compare basic structural network properties to determine similarities and differences between them, as well as their combination in a multiplex configuration. As a natural extension of our work, we plan to evaluate the preservation of the structural network properties and information-based quantities from the following perspectives: (i) presence of similarities across 12 natural languages from 4 linguistic families (Romance, Germanic, Slavic and Uralic), (ii) increase of the size of the number of words (corpus) from 10<sup>4</sup> to 50x10<sup>3</sup>, and (iii) robustness of the networks. Our preliminary findings reinforce the idea of common organizational properties among natural languages. Once concluded, will contribute to the characterization of similarities and differences in the orthographic and phonological perspectives of language networks at a word-level.<br>
自然语言的复杂性可通过多尺度多层网络分析(multiplex analyses)展开探究,分析尺度覆盖单个词汇、词汇组乃至句子层面。
本研究拟针对一种多层词汇级网络展开研究,该网络包含基于距离相似度定义的正字法(orthographic)网络与音系学(phonological)网络。
我们将系统性对比两类网络的基础结构属性,以明确二者间的异同,以及它们在多层网络配置下的组合特征。
作为本研究的自然延伸,我们将从以下维度评估网络结构属性与基于信息的量化指标的保留性:(i) 来自4大语系(罗曼语族、日耳曼语族、斯拉夫语族、乌拉尔语族)的12种自然语言间的共性特征;(ii) 词汇(语料库(corpus))规模从10⁴提升至50×10³的变化影响;(iii) 网络的鲁棒性(robustness)。
我们的初步研究结果进一步佐证了自然语言间存在共性组织属性这一观点。
本研究完成后,将有助于刻画词汇层面语言网络在正字法与音系学视角下的异同特征。
提供机构:
figshare
创建时间:
2020-07-31



