Additional file 1 of The corrected gene proximity map for analyzing the 3D genome organization using Hi-C data

Name: Additional file 1 of The corrected gene proximity map for analyzing the 3D genome organization using Hi-C data
Creator: figshare
Published: 2020-08-25 13:22:28
License: 暂无描述

DataCite Commons2020-08-25 更新2024-07-28 收录

下载链接：

https://springernature.figshare.com/articles/Additional_file_1_of_The_corrected_gene_proximity_map_for_analyzing_the_3D_genome_organization_using_Hi-C_data/12399464

下载链接

链接失效反馈

官方服务：

资源简介：

Additional file 1: Figure S1. Pearson correlation coefficients between the gene co-expression matrix and three different matrices based on spatial positioning of genes: the CGP map (blue bars), the raw gene proximity map (green bars), and the normalized gene proximity map (yellow bars) for each of the 23 chromosomes for 10 ENCODE cell lines. Figure S2. (A) ROC curve for the gene compartment classification using leading eigenvectors of the CGP matrix for GM12878 and K562 cell lines. The horizontal axis is the false positive rate (1 − specificity) and the vertical axis is the true positive rate (sensitivity). The red dot indicates the optimal operating point. Components of the top 50 leading eigenvectors were used as features for the classification model. (B) Effect of the number of eigenvectors used in the gene compartment label classifier. The horizontal axis represents the number of eigenvectors in the CGP matrix used for model construction, ranged from 1 to 50. The vertical axis is the average AUROC of the resultant model over the 10-fold cross validation. The red circles and blue squares (almost completely coincide) represent the GM12878 and K562 cell lines respectively. Using the first leading eigenvector alone does not yield a good classification result. By additionally incorporating the second and third eigenvectors, the AUROC witnesses a dramatic increase (from 0.57 to 0.70). On the other hand, using more than 10 eigenvectors does not provide a substantial performance improvement any more. Figure S3. Objective function based on the empirical gene expression profile and randomized profiles, computed using the raw gene proximity map. The histogram for randomized profiles is normalized to have zero mean. A main difference between the plots generated from the CGP and the raw gene proximity map is that for cell lines RPMI-7951, SJCRH30 and SK-N-DZ, the value of the gene proximity map-based objective function generated from the empirical expression profile is mixed with the values generated from randomized profiles. Figure S4. Change in relative spatial positioning of chromosomes between cell lines GM12878 and K562. The layout of this network is in the same way as Figure 6 in the main text, but the inter-chromosomal proximity matrix here was computed using the gene proximity map instead of the corrected proximity measure. As compared to Figure 6, the connections between chromosomes 3 and 10, and between chromosomes 9 and 22, are no longer easily identified. Table S1. Top 20 inter-chromosomal gene interactions in cell lines GM12878 and K562 respectively. These pairs of genes were selected based on the fact that they are located on different chromosomes and have the largest values in the corresponding CGP map.

提供机构：

figshare

创建时间：

2020-05-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集