PORTULAN ExtraGLUE
收藏arXiv2024-05-09 更新2024-06-21 收录
下载链接:
https://huggingface.co/datasets/PORTULAN/extraglue
下载链接
链接失效反馈官方服务:
资源简介:
PORTULAN ExtraGLUE 数据集由人工智能与计算机科学实验室 (LIACC) 开发,旨在为葡萄牙语提供一系列语言处理任务的基准。该数据集包含14个子集,涵盖单句任务、相似性任务、推理任务和问答任务等。数据集通过机器翻译从英语主流基准转换而来,特别适用于欧洲和巴西葡萄牙语。创建过程中,研究人员使用了先进的机器翻译技术,并针对翻译可能存在的问题进行了分析和调整。PORTULAN ExtraGLUE 数据集的应用领域广泛,主要用于评估和改进葡萄牙语的神经语言模型,推动葡萄牙语自然语言处理技术的发展。
The PORTULAN ExtraGLUE Dataset is developed by the Laboratory for Artificial Intelligence and Computer Science (LIACC), aiming to provide a benchmark suite of natural language processing (NLP) tasks for the Portuguese language. It contains 14 subsets covering single-sentence tasks, semantic similarity tasks, reasoning tasks, question answering tasks and other related NLP tasks. This dataset is converted from mainstream English benchmarks via machine translation, and is specifically tailored for both European Portuguese and Brazilian Portuguese variants. During its development, researchers adopted state-of-the-art machine translation technologies, and conducted analysis and adjustments to address potential issues in the translated outputs. The PORTULAN ExtraGLUE Dataset has a wide range of application scenarios, and is mainly used to evaluate and improve neural language models for Portuguese, so as to promote the development of Portuguese natural language processing technologies.
提供机构:
人工智能与计算机科学实验室 (LIACC)
创建时间:
2024-04-08



