Identification of novel biomarkers for thyroid cancer using multi omics data analysis

DataONE2022-06-02 更新2024-06-15 收录

下载链接：

https://search.dataone.org/view/sha256:7caf96392132be4be239f3b6c5e02263fb77a5b9b6ebed905d70cd8677aaaaa3

下载链接

链接失效反馈

官方服务：

资源简介：

The biomarkers for thyroid cancer are still not known properly. For treating thyroid cancer these biomarkers can by be targeted specifically. Through this project, we identified and used bioinformatics tools to find biomarkers associated with thyroid cancer. Gene Expression Omnibus database (GEO) was used to find dataset related with thyroid cancer. Their expression profiles were downloaded. Four dataset GSE3467, GSE3678, GSE33630, and GSE53157 were identified from GEO database. The dataset GSE3467 contains nine thyroid tumor samples and nine normal thyroid tissue samples. The GSE3678 contains seven thyroid tumor samples and seven normal thyroid tissue samples. The GSE53157 contains twenty four thyroid tumor samples and three normal thyroid samples. The GSE33630 contains sixty thyroid tumor samples and forty five normal thyroid samples. These four datasets were analyzed individually and were integrated at the end to find the common genes among these four datasets. The microarray analysis of the datasets were performed using excel. T.Test analysis were performed for all the four datasets individually on a separate excel sheet. The data was normalized by converting normal value into log scale. Differential expression analysis of all the four datasets were done to identify differentially expresses genes (DEGs). Only upregulated genes were taken into account. Principal component analysis (PCA) of all the four dataset were performed using the raw data. The PCA analysis were performed using T-BioInfo server and the scatterplots were prepared using excel. RStudio was used to match the gene symbols with the corresponding probe ids using left join function. Inner join function in R was used to find integrated genes between the four datasets. Heatmaps of all the four datasets were performed using RStudio. To find number of intersection of Differentially expressed genes, an upset plot was prepared using RStudio. 74 genes with their corresponding probe ids were found to be common among all the four datasets. These genes are common to at least two datasets. These 74 common genes were analyzed using Database for Annotation, Visualization, and Integrated Discovery (DAVID), to study their Gene onotology (GO) functional annotations and pathways. According to the GO functional annotations result, most of the integrated upregulated genes were involved in protein binding, plasma membrane and integral component of membrane. Most common pathway include Extracellular matrix organization, Neutrophil degranulation, TGF-beta signaling pathway and Epithelial to mesenchymal transition in colorectal cancer. These 74 genes were introduced to STRING database to find protein-protein interactions between the genes. Interactions between the nodes were downloaded from STRING database and introduced to Sytoscape. Sytoscape analysis explained that only 19 genes showed protein-protein interactions between each other. Disease free survival analysis of the 13 genes that were common to three datasets were done using GEPIA. Boxplots of these 13 genes were also prepared using GEPIA. This showed that these differentially expressed genes showed different expression in normal thyroid tissue and thyroid tumor samples. Hence these 13 genes common to 3 datasets can be used as potential biomarkers for thyroid cancer. Among these 13 genes, four genes are implicated in cancer/cell proliferation can be probable target for treatment options.

目前甲状腺癌的生物标志物仍未得到充分明确，此类标志物可作为甲状腺癌靶向治疗的特异性靶点。本研究通过生物信息学工具筛选并鉴定与甲状腺癌相关的生物标志物：首先从基因表达综合数据库（Gene Expression Omnibus, GEO）检索获取甲状腺癌相关数据集并下载其表达谱，最终筛选得到GSE3467、GSE3678、GSE33630及GSE53157共4组数据集。其中GSE3467包含9例甲状腺肿瘤样本与9例正常甲状腺组织样本；GSE3678包含7例甲状腺肿瘤样本与7例正常甲状腺组织样本；GSE53157包含24例甲状腺肿瘤样本与3例正常甲状腺样本；GSE33630包含60例甲状腺肿瘤样本与45例正常甲状腺样本。本研究先对4组数据集分别进行独立分析，后续整合以筛选四组数据集共有的差异基因。具体分析流程如下：通过Excel完成数据集的芯片分析，先在单独的Excel工作表中对4组数据集分别开展t检验分析，将原始表达值转换为对数尺度以完成数据标准化；随后对4组数据集分别进行差异表达分析，筛选差异表达基因（Differentially Expressed Genes, DEGs），且仅纳入上调表达的基因。基于原始数据，通过T-BioInfo服务器完成4组数据集的主成分分析（Principal Component Analysis, PCA），并使用Excel绘制散点图。使用RStudio的左连接（left join）函数完成基因符号与对应探针ID的匹配；通过R语言的内连接（inner join）函数筛选四组数据集的共有基因，并利用RStudio绘制4组数据集的热图。为统计差异表达基因的交集情况，使用RStudio绘制UpSet图，最终在4组数据中共筛选得到74个携带对应探针ID的共有基因，此类基因至少在2组数据中存在表达。通过注释、可视化与整合发现数据库（Database for Annotation, Visualization and Integrated Discovery, DAVID）对这74个共有基因进行分析，以开展基因本体（Gene Ontology, GO）功能注释与通路富集研究。GO功能注释结果显示，大部分整合得到的上调基因参与蛋白质结合、质膜及膜整合组分相关生物学过程；富集度最高的通路包括细胞外基质组织、中性粒细胞脱颗粒、转化生长因子-β（TGF-β）信号通路以及结直肠癌上皮间质转化。将74个共有基因提交至STRING数据库以筛选基因间的蛋白质相互作用关系，下载节点间互作数据后导入Cytoscape软件进行分析，结果显示仅19个基因之间存在蛋白质相互作用。通过GEPIA数据库对在3组数据中共有的13个基因进行无病生存分析，并绘制其箱线图。结果显示，此类差异表达基因在正常甲状腺组织与甲状腺肿瘤样本中的表达水平存在显著差异，因此这13个跨3组数据集的共有基因可作为甲状腺癌潜在的生物标志物。在这13个基因中，有4个基因与癌症/细胞增殖过程相关，有望成为甲状腺癌治疗的潜在靶点。

创建时间：

2023-11-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集