ItaCoLA
收藏arXiv2021-09-25 更新2024-06-21 收录
下载链接:
https://github.com/dhfbk/ItaCoLA-dataset
下载链接
链接失效反馈官方服务:
资源简介:
ItaCoLA数据集是由意大利国家研究委员会创建的,包含近10,000条意大利语句子的可接受性判断数据。该数据集遵循与英语CoLA相同的创建方法和步骤,旨在支持语言模型在非英语语言中的可接受性研究。数据集内容丰富,涵盖多种语言现象,如句法结构、语义表达等。创建过程中,数据从多种语言学出版物中手动转录,并由专家标注可接受性。ItaCoLA数据集的应用领域广泛,主要用于测试神经语言模型对语言知识的获取能力,尤其是在跨语言环境下的表现。
The ItaCoLA dataset was developed by the Italian National Research Council, and comprises nearly 10,000 Italian sentence acceptability judgment samples. It adheres to the same creation methodology and workflow as the English CoLA dataset, with the objective of supporting acceptability research for language models in non-English languages. The dataset encompasses a diverse array of linguistic phenomena, including syntactic structures, semantic expressions, and other relevant linguistic aspects, boasting substantial content richness. During its construction, the data was manually transcribed from a variety of linguistic publications and annotated for acceptability by professional experts. The ItaCoLA dataset has wide-ranging application scenarios, and is primarily employed to evaluate the ability of neural language models to acquire linguistic knowledge, particularly their performance in cross-lingual settings.
提供机构:
意大利国家研究委员会
创建时间:
2021-09-25



