Multilingual CommonsenseQA (mCSQA)

Name: Multilingual CommonsenseQA (mCSQA)
Creator: 奈良先端科学技术大学院大学
Published: 2024-06-07 00:14:54
License: 暂无描述

arXiv2024-06-07 更新2024-06-21 收录

下载链接：

https://huggingface.co/datasets/yusuke1997/mCSQA

下载链接

链接失效反馈

官方服务：

资源简介：

mCSQA是一个多语言常识推理数据集，由奈良先端科学技术大学院大学创建，旨在评估语言模型在自然语言理解方面的能力。该数据集包含14,722条记录，涵盖英语、日语、中文等八种语言。创建过程中，利用语言模型生成问题和答案，并通过人工和模型双重验证确保质量。mCSQA特别关注语言特定的常识和知识，用于分析跨语言常识理解能力和语言模型的转移性能，解决了传统翻译数据集无法准确评估语言特定常识的问题。

mCSQA is a multilingual commonsense reasoning dataset developed by Nara Institute of Science and Technology, aimed at evaluating the natural language understanding capabilities of language models. It consists of 14,722 records spanning eight languages including English, Japanese, and Chinese. During the dataset construction process, language models were employed to generate questions and answers, with quality ensured through dual verification by human annotators and models. mCSQA specifically focuses on language-specific commonsense and knowledge, and is used to analyze cross-lingual commonsense understanding abilities and the transfer performance of language models, addressing the shortcoming that traditional translated datasets cannot accurately assess language-specific commonsense knowledge.

提供机构：

奈良先端科学技术大学院大学

创建时间：

2024-06-07

搜集汇总

背景与挑战

背景概述

mCSQA是一个多语言常识推理数据集，包含14,722条记录，覆盖英语、日语、中文等八种语言，通过语言模型生成和双重验证确保质量。它专注于语言特定的常识和知识，用于评估跨语言常识理解能力和语言模型的转移性能，解决了传统翻译数据集的局限性。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集