CMNLI
收藏OpenDataLab2026-04-12 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/CMNLI
下载链接
链接失效反馈资源简介:
CMNLI数据由两部分组成: XNLI和MNLI。数据来自小说、电话、旅游、政府、石板等。将原MNLI数据和XNLI数据转换成中英文,保留原训练集,将XNLI中的dev和MNLI中的match合并,作为CMNLI的开发,和XNLI合并MNLI中的测试和CMNLI中的测试之间的不匹配是相关的,并且顺序是混洗的。该数据集可用于判断给定句子之间的两个句子属于暗示,中立和矛盾。
CMNLI dataset consists of two subsets: XNLI and MNLI. The data is sourced from various domains including fiction, telephone conversations, travel-related content, government documents, and slate texts. The original MNLI and XNLI datasets are translated into both Chinese and English, with their original training splits retained. The development split of CMNLI is created by merging the dev split from XNLI and the matched split from MNLI. The mismatched test split from MNLI and the test split from XNLI are combined to form the CMNLI test split, and the order of all samples is shuffled. This dataset can be used to classify the semantic relationship between two given sentences into three categories: entailment, neutral, and contradiction.
提供机构:
OpenDataLab
创建时间:
2023-09-04
搜集汇总
数据集介绍

背景与挑战
背景概述
CMNLI是一个中文自然语言推理数据集,由XNLI和MNLI数据转换而来,涵盖小说、电话、旅游等多种来源的文本。该数据集用于评估句子对之间的逻辑关系,包括暗示、中立和矛盾三类,是CLUE中文语言理解基准的重要组成部分,发布于2020年,支持预训练模型的评测。
以上内容由遇见数据集搜集并总结生成



