five

CatCoLA - Catalan Corpus of Linguistic Acceptability

收藏
DataCite Commons2025-09-29 更新2025-04-09 收录
下载链接:
https://dataverse.csuc.cat/citation?persistentId=doi:10.34810/data1393
下载链接
链接失效反馈
官方服务:
资源简介:
We introduce CatCoLA, the Catalan Corpus of Linguistic Acceptability that will contribute to the Catalan Language Understanding Benchmark (CLUB) to assess and compare the capabilities of language models (LM) trained with texts in Catalan. CatCoLA follows the design of the English CoLA to support the task of classifying sentences as acceptable or not. CatCoLA consists of 10,443 sentences and their acceptability judgements as found in well-known Catalan reference grammars. Additionally, all sentences have been annotated with the class of linguistic phenomenon the sentence is an example of, also following previous practices. CatCoLA is released under a CC BY-NC-SA 4.0 licence and freely available, but the test data to avoid contamination. Please, write a message to nuria.bel@upf.edu if interested in getting it.
提供机构:
CORA.Repositori de Dades de Recerca
创建时间:
2024-05-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作