跨语言共词化数据集

Name: 跨语言共词化数据集
Creator: 奥尔堡大学哥本哈根计算机科学系
Published: 2023-06-05 15:32:21
License: 暂无描述

arXiv2023-06-05 更新2024-06-21 收录

下载链接：

https://github.com/siebeniris/ColexPhon

下载链接

链接失效反馈

官方服务：

资源简介：

跨语言共词化数据集是由奥尔堡大学哥本哈根计算机科学系的陈依依和约翰内斯·比尔瓦创建的，涵盖了142种语言和21个语系。该数据集包括了具体性和情感性的评分，并与音素和音韵特征相对应。数据集的创建过程涉及精细的编纂程序，旨在促进心理学、认知科学和多语言自然语言处理（NLP）等领域的跨学科研究。该数据集的应用领域广泛，旨在解决语言间的共词化模式及其对认知和情感表达的影响。

The Cross-linguistic Co-wording Dataset was developed by Yiyi Chen and Johannes Bilva from the Department of Computer Science, Aalborg University Copenhagen. It covers 142 languages across 21 language families, and includes concreteness and affective ratings that are aligned with phonemic and phonological features. The dataset was constructed through rigorous curation procedures, aiming to facilitate interdisciplinary research in fields including psychology, cognitive science, and multilingual natural language processing (NLP). With a wide range of application scenarios, this dataset is intended to investigate cross-linguistic co-wording patterns and their impacts on cognitive and affective expression.

提供机构：

奥尔堡大学哥本哈根计算机科学系

创建时间：

2023-06-05

5,000+

优质数据集

54 个

任务类型

进入经典数据集