CoLAC

Name: CoLAC
Creator: Authors of the paper
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/huhailinguist/CoLAC

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为“汉语语言可接受性语料库”（CoLAC），它是首个大规模针对非印欧语系的语言可接受性数据集，由母语者验证，包含两种标签：一种是语言学家的标签，另一种是众包标签。CoLAC作为首个非印欧语系的语言可接受性数据集，为研究语言学可接受性在跨语言迁移方面的课题提供了可能。该数据集的任务是对语言的可接受性进行判断。

The dataset named "Chinese Language Acceptability Corpus (CoLAC)" is the first large-scale linguistic acceptability dataset targeting non-Indo-European languages. Validated by native speakers, it encompasses two types of annotations: linguist-assigned labels and crowdsourced labels. As the pioneering linguistic acceptability dataset for non-Indo-European languages, CoLAC facilitates research on the cross-language transfer of linguistic acceptability. The core task of this dataset is to perform acceptability judgment on linguistic expressions.

提供机构：

Authors of the paper

5,000+

优质数据集

54 个

任务类型

进入经典数据集