S1 Fig.
收藏NIAID Data Ecosystem2026-03-08 收录
下载链接:
https://figshare.com/articles/dataset/_Integrative_Genomic_Signatures_Of_Hepatocellular_Carcinoma_Derived_from_Nonalcoholic_Fatty_Liver_Disease_/1420889
下载链接
链接失效反馈官方服务:
资源简介:
Flow diagram of the steps performed by the RFE and RFE_MR method. Stage1: As it is a backwards procedure starts from the full matrix of selected genes. The process is iterative where the number of iterations either for first selection (x = number of selection iterations) or posterior refinement selection around the selection solution (y = number of refinement iterations) should be specified. It uses the class vector as input. Stage2: Evaluate the selected gene subset. Stage3a. If the process does not take into account the redundancy of the features (RFE): calculates the sample by sample MI excluding each gene. For each excluded gene defines a coefficient I as the difference of the sum of the sample by sample MI between classes and the sum of the sample by sample MI within groups. Stage 3b1: If the process takes into account the redundancy of the features (RFE_MR): for each gene calculates the average gene pairwise mutual information. Stage3b2: For each gene calculates the Coefficient II value by adding the average gene pairwise MI to the coefficient II. Stage4: Remove the m worst coefficient values and their corresponding genes and expression values. Stage5: Find the minimum error rate along the iterations and get the selected genes. S2 Fig. Flow diagram of the steps performed by the MRMR method. Stage1: As it is a forward search procedure, it starts from an empty set of selected genes. The process is iterative where the number of iterations should be specified and uses the class vector as input. Stage2: Calculate the normalized mutual information of the class vector with the vector containing each gene expression values along the samples. Stage3: a. For each gene calculate the average gene pairwise mutual information. b. For each gene in the subset of selected genes calculate the average gene pairwise mutual information. Stage4: For each gene define a coefficient value by dividing the value of the normalized mutual information with the average gene pairwise mutual information. Stage5: Store the gene having the maximum coefficient value and remove from the matrix the corresponding gene. Stage6: Evaluate. Stage7: Find the minimum error rate along the iterations and get selected genes. S3 Fig. Flow diagram of the GA procedure. Stage 1: The procedure initially creates a number of random variable sets (chromosomes). These variable sets form a population of chromosomes. Each random set is created with an initialization that randomly selects 70 genes from the total 504. Stage 2: Each chromosome in the population is evaluated for its ability to predict the group membership of each sample in the dataset (fitness function). Stage 3: Elitism: select the fittest individual intact for the next generation. Stage 4: The population of chromosomes is replicated. The roulette wheel selection ensures that chromosomes with a higher fitness score will generate a more numerous offspring. Stage 5: The genetic information contained in the replicated parent chromosomes is combined through genetic crossover with a crossover probability (For the parameters see supplementary Table 4 and “Parameters in the Genetic Algorithm” supplementary section). The chromosomes are ranked according to their fitness value. Above the crossover probability the best chromosomes are maintained intact for the next generation. Below the crossover probability two randomly selected parent chromosomes are used to create two new chromosomes. This crossover mechanism allows a better exploration of possible solutions recombining good chromosomes. Stage 6: Mutations are then introduced in the new chromosomes generated by crossover randomly with a mutation probability. These mutations produce that new genes are used in chromosomes. Stage 7: The process is repeated from stage 2 until the number of generations exceeds certain threshold (100) and the regression between the population of chromosome’s minimum error rate and the generation is less than 0.05. The cycle of replication (stage 3), genetic crossover (stage 4) and mutations (stage 5) is called generation. S4 Fig. Tree structure where each of the stages of the disease has been clustered in a single cluster. Tree structure where each of the stages of the disease has been clustered in a single cluster using the GS1_clust_FOM algorithm to select the variables used as input in pvclust used to perform hierarchical clustering. S5 Fig. A: Ovalbumin serpin expression along the NAFLD progression.B: Positional gene enrichment. A: Ovalbumin serpin expression along the NAFLD progression. MAT1A_15 and GNMT_ko8 are HCC mice samples where the serpins are overexpressed.B: Positional gene enrichment analysis using PGE program [46] shows that all the genes in ensemble chromosome band 6 p24.3 are overexpressed giving rise to the possibility a common mechanism of gene regulation. S6 Fig. 91 human HCC data clustering. Using complete hierarchical clustering using the Pearson correlation as a similarity measure it is possible to distinguish two stable clusters, cluster A and B that show statistical significant differences of survival length using by Kaplan-Meier plots and log-rank statistics analysis. S7 Fig. HNF4 alpha expression (log2 mouse KO vs wild type) in 3 and 8 month GNMT and MAT1A; and 15 month MAT1A (tumoral tissue, T). S8 Fig. Expression trend (log2 mouse KO vs wild type) of NAFLD progression genes regulated by HNF4a in 3 and 8 month GNMT and MAT1A; and 15 month MAT1A (tumoral tissue, T). S9 Fig. HNF4 alpha expression (log2 disease vs control) in human steatosis and NASH. S10 Fig. Expression trend (log2 mouse KO vs wild type) of NAFLD progression genes regulated by HNF4a in human steatosis and NASH. S11 Fig. Expression trend (log2 mouse KO vs wild type) of biosynthesis of unsaturated fatty acids in human steatosis and NASH; in 3, 8 month GNMT; and MAT1A KO mice and 15 month MAT1A tumors. S12 Fig. Expression (log2 mouse KO vs wild type) of stearoyl-CoA desaturase in human steatosis and NASH in 3, 8 month GNMT; and MAT1A KO mice and 15 month MAT1A tumors. S13 Fig. Expression trend (log2 mouse KO vs wild type) of phenylalanine, tyrosine and tryptophan biosynthesis in human steatosis and NASH; in 3, 8 month GNMT; and MAT1A KO mice and 15 month MAT1A tumors. S14 Fig. Expression trend (log2 mouse KO vs wild type) of androgen and estrogen metabolism in human steatosis and NASH; in 3, 8 month GNMT; and MAT1A KO mice and 15 month MAT1A tumors. S15 Fig. Expression trend (log2 mouse KO vs wild type) of arachidonic acid metabolism in human steatosis and NASH; in 3, 8 month GNMT; and MAT1A KO mice and 15 month MAT1A tumors. S16 Fig. Expression (log2 mouse KO vs wild type) of cyclooxygenase in human steatosis and NASH; in 3, 8 month GNMT; and MAT1A KO mice and 15 month MAT1A tumors. S17 Fig. Expression trend (log2 mouse KO vs wild type) of PPAR signaling pathway in human steatosis and NASH; in 3, 8 month GNMT; and MAT1A KO mice and 15 month MAT1A tumors. S18 Fig. Expression trend (log2 mouse KO vs wild type) of drug metabolism cytochrome P450 in human steatosis and NASH; in 3, 8 month GNMT; and MAT1A KO mice and 15 month MAT1A tumors. S19 Fig. Expression trend (log2 mouse KO vs wild type) of metabolism of xenobiotics by cytochrome P450 in human steatosis and NASH; in 3, 8 month GNMT; and MAT1A KO mice and 15 month MAT1A tumors. S20 Fig. Expression trend (log2 mouse KO vs wild type) of toll-like receptor signaling pathway in human steatosis and NASH; in 3, 8 month GNMT; and MAT1A KO mice and 15 month MAT1A tumors. S21 Fig. Expression trend (log2 mouse KO vs wild type) of p53 signaling pathway in human steatosis and NASH; in 3, 8 month GNMT; and MAT1A KO mice and 15 month MAT1A tumors. S22 Fig. Expression trend (log2 mouse KO vs wild type) of MAPK signaling pathway in human steatosis and NASH; in 3, 8 month GNMT; and MAT1A KO mice and 15 month MAT1A tumors. S23 Fig. Expression trend (log2 mouse KO vs wild type) of bile acid biosynthesis in 3, 8 month GNMT; and MAT1A KO mice and 15 month MAT1A tumors. S1 Table. Summary of the most established biomarkers in NAFLD. S2 Table. Dunn and FOM indexes of the Signatures of NAFLD progression. Dunn and FOM indexes of the Signatures of NAFLD progression resulting from the 14 different supervised clustering based feature selection methods on smoothed data; Ensemble error rate and stability in terms of Hamming distance of the Signatures of NAFLD progression resulting from the 7 different supervised clustering based feature selection methods that minimise the FOM index on smoothed data. S3 Table. Enriched Transcription Factor binding sites. Enriched Transcription Factor binding sites by means of Fisher exact test (p<0.05) in the signatures of NAFLD progression resulting from the two supervised clustering based feature selection methods which produced the optimal clustering result and the two ensemble signatures from raw and smoothed data. S4 Table. Ensemble error rate and the number of the different feature selection methods used to build survival signature. Ensemble error rate and the number of selected genes resulting from the different feature selection methods used to build the survival signatures common for human and mouse.
(DOCX)
创建时间:
2015-12-03



