Orange dataset table
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://figshare.com/articles/dataset/Orange_dataset_table/19146410
下载链接
链接失效反馈官方服务:
资源简介:
The complete dataset used in the analysis comprises 36
samples, each described by 11 numeric features and 1 target. The attributes
considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity
(3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h
incubation with the referred compounds) and oxidation rate, DCFDA fluorescence
(3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA
hydrolysis. The target of each instance corresponds to one of the 9 possible
classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and
0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does
not contain any missing values and data was standardized across features. The
small number of samples prevented a full and strong statistical analysis of the
results. Nevertheless, it allowed the identification of relevant hidden
patterns and trends.
Exploratory data analysis, information gain, hierarchical
clustering, and supervised predictive modeling were performed using Orange Data
Mining version 3.25.1 [41].
Hierarchical clustering was performed using the Euclidean distance metric and
weighted linkage. Cluster maps were plotted to relate the features with higher
mutual information (in rows) with instances (in columns), with the color of
each cell representing the normalized level of a particular feature in a
specific instance. The information is grouped both in rows and in columns by a
two-way hierarchical clustering method using the Euclidean distances and
average linkage. Stratified cross-validation was used to train the supervised
decision tree. A set of preliminary empirical experiments were performed to
choose the best parameters for each algorithm, and we verified that, within
moderate variations, there were no significant changes in the outcome. The following
settings were adopted for the decision tree algorithm: minimum number of
samples in leaves: 2; minimum number of samples required to split an internal
node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The
performance of the supervised model was assessed using accuracy, precision,
recall, F-measure and area under the ROC curve (AUC) metrics.
创建时间:
2022-03-04



