DIALECT-COPA
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/ffaisal93/dialect_copa
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为DIALECT-COPA,专注于南斯拉夫微方言背景下的一般常识推理任务。它包含了专门针对方言变异的语言理解任务实例。为了提升训练数据的质量,该数据集已经采用了多种数据增强技术,包括从英文COPA风格数据集中进行的转写和合成数据生成。每个语言的数据集包含400个实例,其任务是评估南斯拉夫方言中的常识推理。
This dataset, named DIALECT-COPA, focuses on general commonsense reasoning tasks in the context of South Slavic micro-dialects. It includes instances of language understanding tasks specifically tailored for dialectal variation. To enhance the quality of training data, multiple data augmentation techniques have been applied to this dataset, including transcription from English COPA-style datasets and synthetic data generation. Each language-specific dataset contains 400 instances, with the task of evaluating commonsense reasoning within South Slavic dialects.
提供机构:
GMUNLP



