Biomarker Benchmark - GSE19804
收藏DataCite Commons2020-09-04 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/dataset/GSE19804/2069698
下载链接
链接失效反馈官方服务:
资源简介:
<b><br>[NOTICE: This data set has been deprecated. Please see our new version of the data (and additional data sets) here: https://osf.io/mhk93 ]</b><br><b><br></b>"Although smoking is the major risk factor for lung cancer, only 7% of female lung cancer patients in Taiwan have a history of cigarette smoking, extremely lower than those in Caucasian females. This report is a comprehensive analysis of the molecular signature of non-smoking female lung cancer in Taiwan."<br>http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE19804<br>We have included gene-expression data, the outcome (class) being predicted, and any clinical covariates. When gene-expression data were processed in multiple batches, we have provided batch information. Each data set is organized into a file set, where each contains all pertinent files for an individual dataset. The gene expression files have been normalized using both the SCAN and UPC methods using the SCAN.UPC package in Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html). We summarized the data at the gene level using the BrainArray resource (http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/20.0.0/ensg.asp). We used Ensembl identifiers. The class, clinical, and batch data were hand curated to ensure consistency ("tidy data" formatting). In addition, the data files have been formatted to be imported easily into the ML-Flex machine learning package (http://mlflex.sourceforge.net/).
<b><br>[注意:本数据集已停用,请前往以下链接查看该数据集的新版本(及其他新增数据集):https://osf.io/mhk93 ]</b><br><b><br></b>"尽管吸烟是肺癌的主要危险因素,但台湾仅7%的女性肺癌患者有吸烟史,这一比例远低于白人女性群体。本报告针对台湾地区非吸烟女性肺癌患者的分子特征展开了全面分析。"
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE19804
本数据集包含基因表达数据、待预测的结局(类别)以及所有临床协变量。当基因表达数据经多批次处理时,我们已提供批次信息。每个数据集均整理为文件集,其中包含对应单个数据集的全部相关文件。基因表达文件已通过Bioconductor中的SCAN.UPC工具包,采用SCAN与UPC两种方法完成标准化处理,相关工具包链接为:https://www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html。我们借助BrainArray资源在基因水平对数据进行汇总,资源链接为:http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/20.0.0/ensg.asp,并采用Ensembl标识符进行标注。类别、临床及批次数据均经过人工整理以确保一致性,采用整洁数据(tidy data)格式。此外,数据文件已进行格式优化,可轻松导入ML-Flex机器学习工具包,相关工具包链接为:http://mlflex.sourceforge.net/
提供机构:
figshare
创建时间:
2016-02-02
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集为台湾非吸烟女性肺癌的基因表达研究,包含标准化处理的基因表达数据和临床信息,但已弃用并推荐使用新版数据。数据格式支持机器学习分析。
以上内容由遇见数据集搜集并总结生成



