emirhanboge/mnli_llama1b_modified

Name: emirhanboge/mnli_llama1b_modified
Creator: emirhanboge
Published: 2025-03-05 10:13:25
License: 暂无描述

Hugging Face2025-03-05 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/emirhanboge/mnli_llama1b_modified

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个修改后的多体裁自然语言推理(MNLI)数据集，适用于LLaMA 1B模型。数据集中的每个实例包括一个前提、一个假设和一个标签，标签表示前提和假设之间的关系，包括蕴含(entailment)、中立(neutral)和矛盾(contradiction)。此外，每个实例都有一个索引、输入ID、注意力掩码和一个任务标识。数据集被划分为训练集、验证集(分为匹配和不匹配)和测试集(分为匹配和不匹配)。数据集已经使用LLaMA-1B分词器进行了分词，并且序列的最大长度为128个token。

This is a modified version of the Multi-Genre Natural Language Inference (MNLI) dataset tailored for the LLaMA 1B model. Each instance in the dataset includes a premise, a hypothesis, and a label indicating the relationship between the premise and the hypothesis, which includes entailment, neutral, and contradiction. Additionally, each instance has an index, input IDs, attention masks, and a task identifier. The dataset is split into a training set, validation sets (matched and mismatched), and test sets (matched and mismatched). The dataset has been tokenized using the LLaMA-1B tokenizer, and the maximum sequence length is 128 tokens.

提供机构：

emirhanboge

5,000+

优质数据集

54 个

任务类型

进入经典数据集