阿拉伯语常识验证数据集
收藏arXiv2020-08-25 更新2024-06-21 收录
下载链接:
https://github.com/msmadi/Arabic-Dataset-for-Commonsense-Validation
下载链接
链接失效反馈官方服务:
资源简介:
阿拉伯语常识验证数据集是由约旦科技大学计算机科学系创建的,旨在解决机器在理解和验证阿拉伯语文本中的常识问题。该数据集包含12,000条记录,分为训练、验证和测试集,每条记录包含两个句子及其是否违反常识的标签。数据集的创建过程涉及将英文常识验证数据集翻译为阿拉伯语,并确保每对句子在词汇上仅有细微差别。该数据集主要应用于自然语言处理领域,特别是在提高机器对阿拉伯语文本常识理解的能力方面。
The Arabic Commonsense Validation Dataset was created by the Department of Computer Science at Jordan University of Science and Technology, aiming to address the challenge of machines understanding and validating commonsense in Arabic texts. This dataset comprises 12,000 records split into training, validation, and test sets. Each record contains two sentences along with a label indicating whether they violate commonsense. The dataset creation process involved translating English commonsense validation datasets into Arabic, while ensuring that each pair of sentences has only minimal lexical differences. This dataset is primarily utilized in the field of natural language processing, especially to improve machines' ability to comprehend commonsense in Arabic texts.
提供机构:
约旦科技大学计算机科学系
创建时间:
2020-08-25



