five

NoCoLA

收藏
arXiv2023-06-13 更新2024-06-21 收录
下载链接:
https://github.com/ltgoslo/nocola
下载链接
链接失效反馈
官方服务:
资源简介:
NoCoLA是由奥斯陆大学语言技术组创建的挪威语语言可接受性数据集,包含144,867条句子,主要来源于挪威语作为第二语言学习者的作文。数据集通过手动校正和错误标注创建,旨在评估大型语言模型对挪威语语法的理解。NoCoLA包含两种任务:NoCoLAclass用于二分类任务,评估模型微调能力;NoCoLAzero用于零样本评估,测试模型固有的语法判断能力。该数据集主要应用于挪威语语言模型的评估和改进。

NoCoLA is a Norwegian language acceptability dataset created by the Language Technology Group at the University of Oslo. It contains 144,867 sentences primarily sourced from essays written by learners of Norwegian as a second language. The dataset was constructed through manual correction and error annotation, with the aim of evaluating large language models' understanding of Norwegian grammar. NoCoLA includes two tasks: NoCoLAclass for binary classification tasks to evaluate the fine-tuning capability of models, and NoCoLAzero for zero-shot evaluation to test the inherent grammatical judgment ability of models. This dataset is mainly applied to the evaluation and improvement of Norwegian language models.
提供机构:
奥斯陆大学语言技术组
创建时间:
2023-06-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作