NoCoLA

Name: NoCoLA
Creator: 奥斯陆大学语言技术组
Published: 2023-06-13 22:11:19
License: 暂无描述

arXiv2023-06-13 更新2024-06-21 收录

下载链接：

https://github.com/ltgoslo/nocola

下载链接

链接失效反馈

官方服务：

资源简介：

NoCoLA是由奥斯陆大学语言技术组创建的挪威语语言可接受性数据集，包含144,867条句子，主要来源于挪威语作为第二语言学习者的作文。数据集通过手动校正和错误标注创建，旨在评估大型语言模型对挪威语语法的理解。NoCoLA包含两种任务：NoCoLAclass用于二分类任务，评估模型微调能力；NoCoLAzero用于零样本评估，测试模型固有的语法判断能力。该数据集主要应用于挪威语语言模型的评估和改进。

NoCoLA is a Norwegian language acceptability dataset created by the Language Technology Group at the University of Oslo. It contains 144,867 sentences primarily sourced from essays written by learners of Norwegian as a second language. The dataset was constructed through manual correction and error annotation, with the aim of evaluating large language models' understanding of Norwegian grammar. NoCoLA includes two tasks: NoCoLAclass for binary classification tasks to evaluate the fine-tuning capability of models, and NoCoLAzero for zero-shot evaluation to test the inherent grammatical judgment ability of models. This dataset is mainly applied to the evaluation and improvement of Norwegian language models.

提供机构：

奥斯陆大学语言技术组

创建时间：

2023-06-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集