emirhanboge/mnli_llama1b_modified
收藏Hugging Face2025-03-05 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/emirhanboge/mnli_llama1b_modified
下载链接
链接失效反馈官方服务:
资源简介:
这是一个修改后的多体裁自然语言推理(MNLI)数据集,适用于LLaMA 1B模型。数据集中的每个实例包括一个前提、一个假设和一个标签,标签表示前提和假设之间的关系,包括蕴含(entailment)、中立(neutral)和矛盾(contradiction)。此外,每个实例都有一个索引、输入ID、注意力掩码和一个任务标识。数据集被划分为训练集、验证集(分为匹配和不匹配)和测试集(分为匹配和不匹配)。数据集已经使用LLaMA-1B分词器进行了分词,并且序列的最大长度为128个token。
This is a modified version of the Multi-Genre Natural Language Inference (MNLI) dataset tailored for the LLaMA 1B model. Each instance in the dataset includes a premise, a hypothesis, and a label indicating the relationship between the premise and the hypothesis, which includes entailment, neutral, and contradiction. Additionally, each instance has an index, input IDs, attention masks, and a task identifier. The dataset is split into a training set, validation sets (matched and mismatched), and test sets (matched and mismatched). The dataset has been tokenized using the LLaMA-1B tokenizer, and the maximum sequence length is 128 tokens.
提供机构:
emirhanboge



