Additional file 1 of The corrected gene proximity map for analyzing the 3D genome organization using Hi-C data

NIAID Data Ecosystem2026-03-11 收录

下载链接：

https://figshare.com/articles/dataset/Additional_file_1_of_The_corrected_gene_proximity_map_for_analyzing_the_3D_genome_organization_using_Hi-C_data/12399464

下载链接

链接失效反馈

官方服务：

资源简介：

Additional file 1: Figure S1. Pearson correlation coefficients between the gene co-expression matrix and three different matrices based on spatial positioning of genes: the CGP map (blue bars), the raw gene proximity map (green bars), and the normalized gene proximity map (yellow bars) for each of the 23 chromosomes for 10 ENCODE cell lines. Figure S2. (A) ROC curve for the gene compartment classification using leading eigenvectors of the CGP matrix for GM12878 and K562 cell lines. The horizontal axis is the false positive rate (1 − specificity) and the vertical axis is the true positive rate (sensitivity). The red dot indicates the optimal operating point. Components of the top 50 leading eigenvectors were used as features for the classification model. (B) Effect of the number of eigenvectors used in the gene compartment label classifier. The horizontal axis represents the number of eigenvectors in the CGP matrix used for model construction, ranged from 1 to 50. The vertical axis is the average AUROC of the resultant model over the 10-fold cross validation. The red circles and blue squares (almost completely coincide) represent the GM12878 and K562 cell lines respectively. Using the first leading eigenvector alone does not yield a good classification result. By additionally incorporating the second and third eigenvectors, the AUROC witnesses a dramatic increase (from 0.57 to 0.70). On the other hand, using more than 10 eigenvectors does not provide a substantial performance improvement any more. Figure S3. Objective function based on the empirical gene expression profile and randomized profiles, computed using the raw gene proximity map. The histogram for randomized profiles is normalized to have zero mean. A main difference between the plots generated from the CGP and the raw gene proximity map is that for cell lines RPMI-7951, SJCRH30 and SK-N-DZ, the value of the gene proximity map-based objective function generated from the empirical expression profile is mixed with the values generated from randomized profiles. Figure S4. Change in relative spatial positioning of chromosomes between cell lines GM12878 and K562. The layout of this network is in the same way as Figure 6 in the main text, but the inter-chromosomal proximity matrix here was computed using the gene proximity map instead of the corrected proximity measure. As compared to Figure 6, the connections between chromosomes 3 and 10, and between chromosomes 9 and 22, are no longer easily identified. Table S1. Top 20 inter-chromosomal gene interactions in cell lines GM12878 and K562 respectively. These pairs of genes were selected based on the fact that they are located on different chromosomes and have the largest values in the corresponding CGP map.

补充文件1：图S1。针对10个ENCODE细胞系（ENCODE cell lines）的23条染色体，展示基因共表达矩阵（gene co-expression matrix）与基于基因空间定位的三类不同图谱间的皮尔逊相关系数（Pearson correlation coefficients）：CGP图谱（CGP map，蓝色柱形）、原始基因邻近图谱（raw gene proximity map，绿色柱形）以及归一化基因邻近图谱（normalized gene proximity map，黄色柱形）。图S2。(A) 针对GM12878与K562细胞系，使用CGP矩阵的主特征向量（leading eigenvectors）进行基因区室分类（gene compartment classification）的受试者工作特征曲线（ROC curve）。横轴为假阳性率（false positive rate，1−特异度），纵轴为真阳性率（true positive rate，灵敏度），红点代表最优工作点（optimal operating point）。分类模型以CGP矩阵的前50个主特征向量作为特征。(B) 基因区室标签分类器所使用的特征向量数量的影响：横轴为构建模型所用CGP矩阵的特征向量数量，范围为1至50；纵轴为该模型在10折交叉验证（10-fold cross validation）下的平均受试者工作特征曲线下面积（Area Under the Receiver Operating Characteristic Curve，AUROC）。红色圆圈与蓝色方块（二者几乎完全重合）分别对应GM12878与K562细胞系。仅使用第一个主特征向量无法获得良好的分类效果；额外引入第二和第三特征向量后，AUROC出现显著提升（从0.57升至0.70）。而当使用超过10个特征向量时，模型性能不再出现实质性提升。图S3。基于实测基因表达谱（empirical gene expression profile）与随机化表达谱（randomized profiles）的目标函数（objective function），通过原始基因邻近图谱计算所得。随机化表达谱的直方图（histogram）被归一化为均值为0。CGP图谱与原始基因邻近图谱生成的结果图之间的主要差异在于：对于RPMI-7951、SJCRH30与SK-N-DZ细胞系，基于基因邻近图谱的目标函数由实测表达谱生成的数值，与随机化表达谱生成的数值存在混叠。图S4。GM12878与K562细胞系间染色体相对空间定位的变化。该网络布局与正文图6一致，但此处的染色体间邻近矩阵（inter-chromosomal proximity matrix）通过基因邻近图谱计算得到，而非校正后的邻近度量（corrected proximity measure）。与图6相比，3号与10号染色体之间、9号与22号染色体之间的连接不再易于识别。表S1。GM12878与K562细胞系中分别排名前20的染色体间基因相互作用对。这些基因对的筛选依据为：它们位于不同染色体，且在对应CGP图谱中具有最大的数值。

创建时间：

2020-05-29