One Classifier Ignores a Feature
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6502642
下载链接
链接失效反馈官方服务:
资源简介:
The data sets are used in a controlled experiment, where two classifiers should be compared. train_a.csv and explain.csv are slices from the original data set. train_b.csv contains the same instances as in train_a.csv, but with feature x1 set to 0 to make it unusable to classifier B.
The original data set was created and split using this Python code:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
X, y = make_classification(n_samples=300, n_features=2, n_redundant=0, n_informative=2,
n_clusters_per_class=1, class_sep=0.75, random_state=0)
X *= 100
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
lm = LogisticRegression()
lm.fit(X_train, y_train)
clf_a = lm
clf_b = LogisticRegression()
X2 = X.copy()
X2[:, 0] = 0
X2_train, X2_test, y2_train, y2_test = train_test_split(X2, y, test_size=0.5, random_state=0)
clf_b.fit(X2_train, y2_train)
X_explain = X_test
y_explain = y_test
本数据集应用于对照实验,用于对比两个分类器的性能表现。train_a.csv与explain.csv为原始数据集的子集切片。train_b.csv与train_a.csv包含完全一致的样本实例,但将其中特征x1的取值置为0,以此令分类器B无法利用该特征完成训练。
原始数据集通过以下Python代码生成并拆分:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
X, y = make_classification(n_samples=300, n_features=2, n_redundant=0, n_informative=2,
n_clusters_per_class=1, class_sep=0.75, random_state=0)
X *= 100
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
lm = LogisticRegression()
lm.fit(X_train, y_train)
clf_a = lm
clf_b = LogisticRegression()
X2 = X.copy()
X2[:, 0] = 0
X2_train, X2_test, y2_train, y2_test = train_test_split(X2, y, test_size=0.5, random_state=0)
clf_b.fit(X2_train, y2_train)
X_explain = X_test
y_explain = y_test
创建时间:
2022-04-29



