five

muhammadravi251001/debug-entailment

收藏
Hugging Face2023-09-10 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/muhammadravi251001/debug-entailment
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集可能用于自然语言推理(NLI)任务,包含前提(premise)、假设(hypothesis)和标签(label)等列。数据集分为训练集、验证集和测试集,并且提供了保留无效数据和删除无效数据的两种版本。
提供机构:
muhammadravi251001
原始信息汇总

数据集概述

数据集下载方法

仅下载特定列(premise, hypothesis, label)

python from datasets import load_dataset, Dataset, DatasetDict import pandas as pd

data_files = {"train": "data_nli_train_df_debug.csv", "validation": "data_nli_val_df_debug.csv", "test": "data_nli_test_df_debug.csv"}

dataset = load_dataset("muhammadravi251001/debug-entailment", data_files=data_files)

selected_columns = ["premise", "hypothesis", "label"]

df_train = pd.DataFrame(dataset["train"]) df_train = df_train[selected_columns]

df_val = pd.DataFrame(dataset["validation"]) df_val = df_val[selected_columns]

df_test = pd.DataFrame(dataset["test"]) df_test = df_test[selected_columns]

train_dataset = Dataset.from_dict(df_train) validation_dataset = Dataset.from_dict(df_val) test_dataset = Dataset.from_dict(df_test)

dataset = DatasetDict({"train": train_dataset, "validation": validation_dataset, "test": test_dataset})

保留无效数据的数据集

python from datasets import load_dataset, Dataset, DatasetDict import pandas as pd

data_files = {"train": "data_nli_train_df_keep.csv", "validation": "data_nli_val_df_keep.csv", "test": "data_nli_test_df_keep.csv"}

dataset = load_dataset("muhammadravi251001/debug-entailment", data_files=data_files)

selected_columns = dataset.column_names[train]

df_train = pd.DataFrame(dataset["train"]) df_train = df_train[selected_columns]

df_val = pd.DataFrame(dataset["validation"]) df_val = df_val[selected_columns]

df_test = pd.DataFrame(dataset["test"]) df_test = df_test[selected_columns]

train_dataset = Dataset.from_dict(df_train) validation_dataset = Dataset.from_dict(df_val) test_dataset = Dataset.from_dict(df_test)

dataset = DatasetDict({"train": train_dataset, "validation": validation_dataset, "test": test_dataset})

删除无效数据的数据集

python from datasets import load_dataset, Dataset, DatasetDict import pandas as pd

data_files = {"train": "data_nli_train_df_drop.csv", "validation": "data_nli_val_df_drop.csv", "test": "data_nli_test_df_drop.csv"}

dataset = load_dataset("muhammadravi251001/debug-entailment", data_files=data_files)

selected_columns = dataset.column_names[train]

df_train = pd.DataFrame(dataset["train"]) df_train = df_train[selected_columns]

df_val = pd.DataFrame(dataset["validation"]) df_val = df_val[selected_columns]

df_test = pd.DataFrame(dataset["test"]) df_test = df_test[selected_columns]

train_dataset = Dataset.from_dict(df_train) validation_dataset = Dataset.from_dict(df_val) test_dataset = Dataset.from_dict(df_test)

dataset = DatasetDict({"train": train_dataset, "validation": validation_dataset, "test": test_dataset})

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作