Biomarker Benchmark - GSE37147
收藏DataCite Commons2020-09-04 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/dataset/GSE37147/2069705
下载链接
链接失效反馈官方服务:
资源简介:
<b><br>[NOTICE: This data set has been deprecated. Please see our new version of the data (and additional data sets) here: https://osf.io/mhk93 ]</b><br><b><br></b>"RNA was isolated from bronchial brushings obtained from current and former smokers with and without COPD. mRNA expression was profiled using Affymetrix Human Gene 1.0 ST Arrays."<br>http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37147<br>We have included gene-expression data, the outcome (class) being predicted, and any clinical covariates. When gene-expression data were processed in multiple batches, we have provided batch information. Each data set is organized into a file set, where each contains all pertinent files for an individual dataset. The gene expression files have been normalized using both the SCAN and UPC methods using the SCAN.UPC package in Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html). We summarized the data at the gene level using the BrainArray resource (http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/20.0.0/ensg.asp). We used Ensembl identifiers. The class, clinical, and batch data were hand curated to ensure consistency ("tidy data" formatting). In addition, the data files have been formatted to be imported easily into the ML-Flex machine learning package (http://mlflex.sourceforge.net/).<br><br>
<b><br>[注意:本数据集已停用,请访问以下链接获取新版数据集及其他相关数据集:https://osf.io/mhk93 ]</b><br><b><br></b>研究人员从伴或不伴慢性阻塞性肺疾病(Chronic Obstructive Pulmonary Disease, COPD)的当前吸烟者及既往吸烟者的支气管刷取标本中分离核糖核酸(Ribonucleic Acid, RNA),并采用Affymetrix人类基因1.0 ST芯片(Affymetrix Human Gene 1.0 ST Arrays)对信使RNA(messenger RNA, mRNA)的表达谱进行检测。<br>http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37147<br>本数据集包含基因表达数据、待预测的结局(分类标签)以及所有临床协变量。当基因表达数据经多批次处理时,我们一并提供了批次信息。每个数据集均整理为独立文件组,每组包含对应单个数据集的全部相关文件。基因表达文件已通过Bioconductor中的SCAN.UPC软件包,采用SCAN与UPC两种方法完成标准化处理(https://www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html)。本研究采用BrainArray数据库(http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/20.0.0/ensg.asp)在基因水平对数据进行汇总,并采用Ensembl标识符进行标注。分类标签、临床及批次数据均经人工整理校准,以确保数据格式符合一致性规范(即"tidy data"格式)。此外,所有数据文件均已格式化,可直接导入ML-Flex机器学习软件包(http://mlflex.sourceforge.net/)。
提供机构:
figshare
创建时间:
2016-02-02



