NorDial
收藏arXiv2021-04-11 更新2024-06-21 收录
下载链接:
https://github.com/jerbarnes/norwegian_dialect
下载链接
链接失效反馈官方服务:
资源简介:
NorDial是一个初步的书面挪威方言使用语料库,由奥斯陆大学信息学系创建。该数据集收集了1073条推文,并手动标注为标准挪威语、新挪威语、任何方言或混合形式。数据集的创建过程涉及从北欧方言语料库中收集频率双词列表,并由两名母语者进行标注。NorDial旨在解决挪威方言在社交媒体等非正式领域的使用和变化研究问题,为方言特征的活力提供证据。
NorDial is a preliminary written corpus of Norwegian dialects, developed by the Department of Informatics, University of Oslo. This dataset includes 1073 collected tweets, which are manually annotated into four categories: Standard Norwegian, Nynorsk, any dialect, or mixed forms. The construction of the dataset involves gathering frequency bigram lists from Nordic dialect corpora, with annotation carried out by two native speakers. NorDial is intended to address research on the usage and variation of Norwegian dialects in informal contexts such as social media, providing empirical evidence for the vitality of dialectal features.
提供机构:
奥斯陆大学信息学系
创建时间:
2021-04-11



