Additional file 1 of Sparse feature selection for classification and prediction of metastasis in endometrial cancer
收藏DataCite Commons2024-12-18 更新2024-07-25 收录
下载链接:
https://springernature.figshare.com/articles/dataset/Additional_file_1_of_Sparse_feature_selection_for_classification_and_prediction_of_metastasis_in_endometrial_cancer/4793008
下载链接
链接失效反馈官方服务:
资源简介:
List and description of supplemental tables. Table S1. This table contains the measurements of 1428 micro-RNAs for 94 Samples. The rows correspond to the features (miRNA) and the columns correspond to the samples. The samples consist of 47 lymph node-positive and 47 lymph node-negative samples. 43.75% of the entries in this sheet are NaN. It contains measurements for 213 miRNAs of 86 samples. Out of those 86 samples, 43 are lymph node-positive, and the remaining 43 are lymph node-negative. A sample whose label has the term IB or IC belongs to a lymph node-negative patient, whereas a sample with a label containing IIIC belong to a lymph node-positive patient. A lymph node-positive or neagtive status was defined empiracally during pimary staging. Table S2. This table contains a subset of the raw data, used for training the classifier. This data was obtained by removing four patients from each class, and 1,215 features. It contains measurements for 213 miRNAs of 86 samples. Out of those 86 samples, 43 are lymph node-positive, and the remaining 43 are lymph node-negative. Table S3. This table contains the normalized version of the training data. The following procedure is used for normalization: 1) From each entry of the i-th row vector (i-th feature vector), we subtract the mean value m i of the i-th row vector computed over all the 86 samples. 2) Multiply each entry of the i-th row vector by a scale factor s i so that the resulting vector has euclidean norm equal to the square root of 86. Table S4. The lone star algorithm selected 18 final features. This sheet contains the 20 best classifiers based on these eightteen features, sorted with respect to accuracy. The sensitivity, specificity and accuracy figures (columns T, U and V) are based on the classification of the 86 samples in the training data by the corresponding classifier.Table S5. This table shows the classifier obtained by taking the average of the classifiers in Sheet 4. In particular, we average the numbers in each column of the 20 classifiers given in Sheet 4 (20 best classifiers) (Columns A-S). Table S6. This sheet contains clinical information about the independent cohort of 28 patients who were used to validate the classifier. Out of these, 9 are lymph-node positive and 19 are lymph node-negative. Table S7. This sheet contains the raw microRNA measurements on the 28 test data samples. Table S8. This is the transformed version of the test data. We apply the same transformation as w did for the training data, as described on Sheet 3. For each of the 18 features (miRNAs), we subtract the original mean value m i from each entry and multiply each entry by the constant s i . The calculation of m i and s i is as in Additional file 1, Table S3. Table S9. This sheet contains the discriminant values of the classifier on the Test Data. In column D an entry of 1 means that the sample is correctly classified. Table 10. This sheet contains the number of overlaps between our 23 gene signature with the pathways in the KEGG database. The q-value is obtained from the Fisher exact test after the Benjamini-Hochberg multiple testing correction and quantifies the statistical significance of the overlap between the gene list and a set of genes in a particular pathway. (1170 KB XLSX)
提供机构:
Figshare
创建时间:
2017-03-28



