ScaLA, ScandiQA
收藏arXiv2023-04-03 更新2024-06-21 收录
下载链接:
https://huggingface.co/ScandEval
下载链接
链接失效反馈官方服务:
资源简介:
ScaLA和ScandiQA是专为斯堪的纳维亚语言设计的两个新数据集,分别用于语言可接受性和问答任务。ScaLA数据集基于斯堪的纳维亚版本的通用依赖数据集,包含1,024个训练样本,256个验证样本和2,048个测试样本。ScandiQA数据集则基于MKQA数据集,包含7,810个丹麦语样本,7,798个瑞典语样本和7,813个挪威语样本。这两个数据集的创建旨在推动斯堪的纳维亚语言的自然语言处理研究,特别是在模型评估和跨语言转移方面。
ScaLA and ScandiQA are two novel datasets specifically designed for Scandinavian languages, serving for linguistic acceptability and question answering tasks respectively. The ScaLA dataset is based on the Scandinavian version of the Universal Dependencies dataset, which contains 1,024 training samples, 256 validation samples and 2,048 test samples. The ScandiQA dataset, built on the MKQA dataset, includes 7,810 Danish samples, 7,798 Swedish samples and 7,813 Norwegian samples. The creation of these two datasets aims to advance natural language processing research on Scandinavian languages, particularly in the areas of model evaluation and cross-lingual transfer.
提供机构:
亚历山德拉研究所
创建时间:
2023-04-03



