five

NorDiaChange

收藏
arXiv2022-04-28 更新2024-06-21 收录
下载链接:
https://github.com/ltgoslo/nor_dia_change
下载链接
链接失效反馈
官方服务:
资源简介:
NorDiaChange是首个针对挪威语的历时语义变化数据集,由奥斯陆大学创建。该数据集包含两个新颖的子集,涵盖约80个挪威名词,这些名词被手动标注了随着时间变化的,语义变化的程度。数据集覆盖了与战前战后事件、挪威的石油和天然气发现以及技术发展相关的时间段。标注工作采用DURel框架和两个大型历史挪威语料库进行。NorDiaChange数据集完全公开,包含原始标注数据和推断的历时词汇使用图(DWUGs)。该数据集适用于评估词汇语义变化检测系统或一般上下文化嵌入,也可供历史语言学家使用。

NorDiaChange is the first diachronic semantic change dataset for the Norwegian language, created by the University of Oslo. This dataset includes two novel subsets, covering approximately 80 Norwegian nouns that have been manually annotated with the degree of semantic change they have undergone over time. The dataset spans time periods related to pre- and post-war events, Norway's oil and gas discoveries, and technological developments. The annotation work was conducted using the DURel framework and two large historical Norwegian corpora. The NorDiaChange dataset is fully publicly available, containing the original annotated data and inferred diachronic word usage graphs (DWUGs). This dataset is suitable for evaluating lexical semantic change detection systems or general contextual embeddings, and can also be utilized by historical linguists.
提供机构:
奥斯陆大学
创建时间:
2022-01-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作