five

RuCoLA

收藏
arXiv2022-10-24 更新2024-07-24 收录
下载链接:
https://rucola-benchmark.com/
下载链接
链接失效反馈
官方服务:
资源简介:
RuCoLA是首个大规模俄语语言可接受性分类语料库,由俄罗斯高等经济学院等机构创建。该数据集包含13400个句子,分为两部分:9800个来自语言学出版物的句子,以及3600个由生成模型产生的句子。RuCoLA旨在通过提供详细的可接受性判断和错误分类,帮助评估和改进语言模型的语法知识,特别是在处理形态和语义错误方面。此外,数据集还包括一个公共排行榜,用于评估语言模型在俄语上的表现。

RuCoLA is the first large-scale Russian language acceptability classification corpus, developed by institutions including the Higher School of Economics in Russia and other relevant organizations. This corpus contains 13,400 sentences divided into two subsets: 9,800 sentences sourced from linguistic publications, and 3,600 sentences generated by generative models. RuCoLA is designed to help evaluate and enhance the grammatical knowledge of language models, especially in handling morphological and semantic errors, by providing detailed acceptability judgments and error categorizations. Furthermore, the corpus includes a public leaderboard for assessing the performance of language models on Russian language tasks.
提供机构:
俄罗斯高等经济学院
创建时间:
2022-10-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作