Complete BHC results.
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/Complete_BHC_results_/14829567
下载链接
链接失效反馈官方服务:
资源简介:
This contains six sub-folders, including: S3.1 explains the two-dimensional clustering analysis that can be performed using the patient-expression data. S3.2 describes the process of consolidating clusters to classes. This was performed to reduce the number of clusters and to increase the sample size within a class. S3.3 contains all the plots produced from this work, including three from TCGA and three from METABRIC. The blue solid lines in the dendrograms show preferred merges by BHC (clusters). Red dashed line show merges further up the cluster hierarchy. The numbers on the branches are the log odds for merging. S3.4 contains the clustering results from the three TCGA analyses and an .pdf image illustrating a Venn diagram with the overlaps between the different TCGA Basal dominant classes. Each pairwise analysis is organised into its analysis folder, containing a Data folder (for the input files) and a Working_directory folder (for the output files). The Data folder contains a relevant median-centred patient expression data (.csv file). The Working_directory folder contains two text files (.txt) describing the members of patients or genes in the resulting clusters, two resulting plots (.pdf files) and R data (.RDa files) produced while running the code (provided in S3.6). Each analysis folder also contains a patients_hc.pdf file that illustrates the hierarchical structure for the patient clusters. This was generated separately for visualisation purposes using the plot() function and hc_b.Rda is in the Working_directory as the data. S3.5 contains clustering results from the three METABRIC analyses. The folder organisations and files contained in this supplementary are equivalent to S3.4 above. S3.6 contains the Clustering_code_using_BHC.R with written descriptions and remarks in code, provided for reproducible research.
(ZIP)
本数据集包含六个子文件夹,具体说明如下:S3.1 介绍了可基于患者表达数据开展的二维聚类分析。S3.2 阐述了将聚类合并为类别的流程,该操作旨在减少聚类总数并提升单个类别内的样本量。S3.3 收纳了本研究生成的全部可视化图表,其中3份来自癌症基因组图谱(The Cancer Genome Atlas, TCGA),另外3份来自分子分类学乳腺癌国际联盟(METABRIC)。聚类树状图中的蓝色实线代表贝叶斯层次聚类(Bayesian Hierarchical Clustering, BHC)优选的聚类合并结果,红色虚线代表聚类层级中更高层级的合并操作,分支上的数字为合并操作对应的对数优势比(log odds)。S3.4 包含三项TCGA分析的聚类结果,以及一张用于展示不同TCGA基底优势型类别间重叠关系的韦恩图(Venn Diagram)PDF图像。每一项成对分析均单独归档至对应的分析文件夹中,每个分析文件夹内含一个Data文件夹(用于存储输入文件)与一个Working_directory文件夹(用于存储输出文件)。其中Data文件夹存储经中位数中心化处理的患者表达数据逗号分隔值(CSV)文件,Working_directory文件夹则包含三类文件:两份用于描述最终聚类中患者或基因成员的TXT文本文件、两张结果可视化PDF图表文件,以及运行代码时生成的R数据(.RDa)文件(相关代码详见S3.6)。每个分析文件夹还包含一份patients_hc.pdf文件,该文件通过plot()函数单独生成,用于可视化患者聚类的层级结构,其对应的数据文件hc_b.Rda已存储于Working_directory文件夹中。S3.5 包含三项METABRIC分析的聚类结果,该子文件夹的组织结构与内含文件与前述S3.4完全一致。S3.6 包含带有代码注释与说明文档的Clustering_code_using_BHC.R文件,用于支撑可复现研究。本数据集以ZIP格式打包。
创建时间:
2021-06-23



