DIALECT-COPA

Name: DIALECT-COPA
Creator: GMUNLP
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/ffaisal93/dialect_copa

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为DIALECT-COPA，专注于南斯拉夫微方言背景下的一般常识推理任务。它包含了专门针对方言变异的语言理解任务实例。为了提升训练数据的质量，该数据集已经采用了多种数据增强技术，包括从英文COPA风格数据集中进行的转写和合成数据生成。每个语言的数据集包含400个实例，其任务是评估南斯拉夫方言中的常识推理。

This dataset, named DIALECT-COPA, focuses on general commonsense reasoning tasks in the context of South Slavic micro-dialects. It includes instances of language understanding tasks specifically tailored for dialectal variation. To enhance the quality of training data, multiple data augmentation techniques have been applied to this dataset, including transcription from English COPA-style datasets and synthetic data generation. Each language-specific dataset contains 400 instances, with the task of evaluating commonsense reasoning within South Slavic dialects.

提供机构：

GMUNLP

5,000+

优质数据集

54 个

任务类型

进入经典数据集