NoCoLA
收藏arXiv2023-06-13 更新2024-06-21 收录
下载链接:
https://github.com/ltgoslo/nocola
下载链接
链接失效反馈官方服务:
资源简介:
NoCoLA是由奥斯陆大学语言技术组创建的挪威语语言可接受性数据集,包含144,867条句子,主要来源于挪威语作为第二语言学习者的作文。数据集通过手动校正和错误标注创建,旨在评估大型语言模型对挪威语语法的理解。NoCoLA包含两种任务:NoCoLAclass用于二分类任务,评估模型微调能力;NoCoLAzero用于零样本评估,测试模型固有的语法判断能力。该数据集主要应用于挪威语语言模型的评估和改进。
NoCoLA is a Norwegian language acceptability dataset created by the Language Technology Group at the University of Oslo. It contains 144,867 sentences primarily sourced from essays written by learners of Norwegian as a second language. The dataset was constructed through manual correction and error annotation, with the aim of evaluating large language models' understanding of Norwegian grammar. NoCoLA includes two tasks: NoCoLAclass for binary classification tasks to evaluate the fine-tuning capability of models, and NoCoLAzero for zero-shot evaluation to test the inherent grammatical judgment ability of models. This dataset is mainly applied to the evaluation and improvement of Norwegian language models.
提供机构:
奥斯陆大学语言技术组
创建时间:
2023-06-13



