five

Biomarker Benchmark - GSE37745

收藏
Figshare2016-03-17 更新2026-04-29 收录
下载链接:
https://figshare.com/articles/dataset/GSE37745/2069707
下载链接
链接失效反馈
官方服务:
资源简介:
[NOTICE: This data set has been deprecated. Please see our new version of the data (and additional data sets) here: https://osf.io/mhk93 ]" Background: Global gene expression profiling has been widely used in lung cancer research to identify clinically relevant molecular subtypes as well as to predict prognosis and therapy response. So far, the value of these multi-gene signatures in clinical practice is unclear and the biological importance of individual genes is difficult to assess as the published signatures virtually do not overlap.Methods: Here we describe a novel single institute cohort, including 196 non-small lung cancer (NSCLC) cases with clinical information and long-term follow-up, which was used as a training set to screen for single genes with prognostic impact. The top 450 gene probe sets identified using a univariate Cox regression model (significance level pResults: The meta-analysis revealed that 17 probe sets were significantly associated with survival (pConclusions: We were able to validate single genes with independent prognostic impact using a novel NSCLC cohort together with a meta-analysis approach. CADM1 was identified as an immunohistochemical marker with a potential application in clinical diagnostics."http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37745We have included gene-expression data, the outcome (class) being predicted, and any clinical covariates. When gene-expression data were processed in multiple batches, we have provided batch information. Each data set is organized into a file set, where each contains all pertinent files for an individual dataset. The gene expression files have been normalized using both the SCAN and UPC methods using the SCAN.UPC package in Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html). We summarized the data at the gene level using the BrainArray resource (http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/20.0.0/ensg.asp). We used Ensembl identifiers. The class, clinical, and batch data were hand curated to ensure consistency ("tidy data" formatting). In addition, the data files have been formatted to be imported easily into the ML-Flex machine learning package (http://mlflex.sourceforge.net/).
创建时间:
2016-03-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作