Cross-Linguistic Polysemies (Data from: Using network approaches to enhance the analysis of cross-linguistic polysemies)
收藏OpenDataLab2026-05-31 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/Cross-Linguistic_Polysemies
下载链接
链接失效反馈官方服务:
资源简介:
长期以来,人们已经注意到跨语言重复出现的多义词可以作为概念关系的指标,并且最近提出了很多建模和分析此类数据的方法。尽管——鉴于数据的性质——借助网络技术对其进行建模和分析似乎很自然,但只有少数几种方法可以明确使用它们。在本文中,我们展示了加权网络模型的严格应用如何帮助从跨语言多义词中获得更多收益,而不是使用仅基于逐项比较的方法。在我们的研究中,我们使用由 1252 个语义项目组成的大型数据集,翻译成 195 种不同的语言,涵盖 44 个不同的语言系列。通过分析从数据中重构的网络的社区结构,我们发现大多数概念(68%)可以分成 104 个由五个或更多节点组成的大型社区。这些大型社区几乎完全将有意义的概念组合成概念领域。它们为深入分析历史语义中的各种主题提供了一个有效的起点,例如同源检测、词源分析和语义重建。
It has long been observed that cross-linguistically recurring polysemous terms can serve as indicators of conceptual relationships, and a multitude of methods for modeling and analyzing such data have been proposed in recent years. Despite the apparent naturalness of utilizing network techniques to model and analyze such data given its inherent characteristics, only a small number of approaches explicitly adopt them. In this paper, we demonstrate how rigorous application of weighted network models can yield greater insights from cross-linguistic polysemy data compared to methods relying solely on item-by-item comparisons. In our study, we utilize a large dataset composed of 1,252 semantic items, which have been translated into 195 distinct languages and span 44 different language families. By analyzing the community structure of networks reconstructed from this dataset, we find that the majority of concepts (68%) can be clustered into 104 large communities, each consisting of five or more nodes. These large communities almost exclusively group meaningful concepts into coherent conceptual domains, providing an effective starting point for in-depth analyses of various topics in historical semantics, including cognate detection, etymological analysis, and semantic reconstruction.
提供机构:
OpenDataLab
创建时间:
2022-05-23
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集基于一项2013年的研究,使用网络方法分析跨语言多义词,包含1252个语义项目,覆盖195种语言和44个语言系列。通过加权网络模型,研究发现68%的概念可划分为104个大型社区,这些社区有效聚合了概念领域,为历史语义分析提供了基础。
以上内容由遇见数据集搜集并总结生成



