muhammadravi251001/debug-entailment
收藏数据集概述
数据集下载方法
仅下载特定列(premise, hypothesis, label)
python from datasets import load_dataset, Dataset, DatasetDict import pandas as pd
data_files = {"train": "data_nli_train_df_debug.csv", "validation": "data_nli_val_df_debug.csv", "test": "data_nli_test_df_debug.csv"}
dataset = load_dataset("muhammadravi251001/debug-entailment", data_files=data_files)
selected_columns = ["premise", "hypothesis", "label"]
df_train = pd.DataFrame(dataset["train"]) df_train = df_train[selected_columns]
df_val = pd.DataFrame(dataset["validation"]) df_val = df_val[selected_columns]
df_test = pd.DataFrame(dataset["test"]) df_test = df_test[selected_columns]
train_dataset = Dataset.from_dict(df_train) validation_dataset = Dataset.from_dict(df_val) test_dataset = Dataset.from_dict(df_test)
dataset = DatasetDict({"train": train_dataset, "validation": validation_dataset, "test": test_dataset})
保留无效数据的数据集
python from datasets import load_dataset, Dataset, DatasetDict import pandas as pd
data_files = {"train": "data_nli_train_df_keep.csv", "validation": "data_nli_val_df_keep.csv", "test": "data_nli_test_df_keep.csv"}
dataset = load_dataset("muhammadravi251001/debug-entailment", data_files=data_files)
selected_columns = dataset.column_names[train]
df_train = pd.DataFrame(dataset["train"]) df_train = df_train[selected_columns]
df_val = pd.DataFrame(dataset["validation"]) df_val = df_val[selected_columns]
df_test = pd.DataFrame(dataset["test"]) df_test = df_test[selected_columns]
train_dataset = Dataset.from_dict(df_train) validation_dataset = Dataset.from_dict(df_val) test_dataset = Dataset.from_dict(df_test)
dataset = DatasetDict({"train": train_dataset, "validation": validation_dataset, "test": test_dataset})
删除无效数据的数据集
python from datasets import load_dataset, Dataset, DatasetDict import pandas as pd
data_files = {"train": "data_nli_train_df_drop.csv", "validation": "data_nli_val_df_drop.csv", "test": "data_nli_test_df_drop.csv"}
dataset = load_dataset("muhammadravi251001/debug-entailment", data_files=data_files)
selected_columns = dataset.column_names[train]
df_train = pd.DataFrame(dataset["train"]) df_train = df_train[selected_columns]
df_val = pd.DataFrame(dataset["validation"]) df_val = df_val[selected_columns]
df_test = pd.DataFrame(dataset["test"]) df_test = df_test[selected_columns]
train_dataset = Dataset.from_dict(df_train) validation_dataset = Dataset.from_dict(df_val) test_dataset = Dataset.from_dict(df_test)
dataset = DatasetDict({"train": train_dataset, "validation": validation_dataset, "test": test_dataset})



