Biomarker Benchmark - Gene expression data from Gene Expression Omnibus - GSE40292
收藏DataCite Commons2020-09-04 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/dataset/GSE40292/2069710/5
下载链接
链接失效反馈官方服务:
资源简介:
"Genome-wide association studies (GWAS) have been pivotal to increasing our understanding of intestinal disease. However, the mode by which genetic variation results in phenotypic change remains largely unknown, with many associated polymorphisms likely to modulate gene expression. Analyses of expression quantitative trait loci (eQTL) to date indicate that as many as 50% of these are tissue specific. Here we report a comprehensive eQTL scan of intestinal tissue."<br>http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE40292<br>We have included gene-expression data, the outcome (class) being predicted, and any clinical covariates. When gene-expression data were processed in multiple batches, we have provided batch information. Each data set is organized into a file set, where each contains all pertinent files for an individual dataset. The gene expression files have been normalized using both the SCAN and UPC methods using the SCAN.UPC package in Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html). We summarized the data at the gene level using the BrainArray resource (http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/20.0.0/ensg.asp). We used Ensembl identifiers. The class, clinical, and batch data were hand curated to ensure consistency ("tidy data" formatting). In addition, the data files have been formatted to be imported easily into the ML-Flex machine learning package (http://mlflex.sourceforge.net/).
全基因组关联研究(Genome-wide association studies, GWAS)对于加深我们对肠道疾病的认知发挥了关键作用。然而,遗传变异引发表型改变的具体机制仍在很大程度上尚不明确,诸多关联的多态性或可调控基因表达。迄今为止针对表达数量性状位点(expression quantitative trait loci, eQTL)的分析显示,其中多达50%的位点具有组织特异性。本研究报道了一项针对肠道组织的全面eQTL扫描分析。<br>http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE40292
本数据集包含基因表达数据、待预测的结局(类别)以及所有临床协变量。若基因表达数据经多批次处理,则已附带批次信息。每个数据集均以文件组形式组织,每组文件包含对应单个数据集的全部相关文件。基因表达文件已通过Bioconductor中的SCAN.UPC软件包,采用SCAN与UPC两种方法完成标准化处理(https://www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html)。本研究借助BrainArray资源(http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/20.0.0/ensg.asp)在基因水平对数据进行了汇总,并采用Ensembl标识符进行标注。类别、临床及批次数据均经人工整理以确保一致性,采用"整洁数据(tidy data)"格式规范。此外,所有数据文件均已格式化,可轻松导入ML-Flex机器学习软件包(http://mlflex.sourceforge.net/)。
提供机构:
figshare
创建时间:
2016-03-16



