five

Biomarker Benchmark - Gene expression data from Gene Expression Omnibus - GSE56396

收藏
Mendeley Data2024-06-27 更新2024-06-27 收录
下载链接:
https://figshare.com/articles/dataset/GSE56369/2069714/4
下载链接
链接失效反馈
官方服务:
资源简介:
"Evaluation of the airway transcriptome may reveal patterns of gene expression that are associated with clinical phenotypes of asthma. To define transcriptomic endotypes of asthma (TEA) we analyzed gene expression in induced sputum that correlate with phenotypes of disease. Gene expression was measured in sputum of subjects with asthma using Affymetrix HuGene ST 1.0 microarrays. Unsupervised clustering analysis of genes identified TEA clusters. Clinical characteristics were compared." http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE56396 We have included gene-expression data, the outcome (class) being predicted, and any clinical covariates. When gene-expression data were processed in multiple batches, we have provided batch information. Each data set is organized into a file set, where each contains all pertinent files for an individual dataset. The gene expression files have been normalized using both the SCAN and UPC methods using the SCAN.UPC package in Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html). We summarized the data at the gene level using the BrainArray resource (http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/20.0.0/ensg.asp). We used Ensembl identifiers. The class, clinical, and batch data were hand curated to ensure consistency ("tidy data" formatting). In addition, the data files have been formatted to be imported easily into the ML-Flex machine learning package (http://mlflex.sourceforge.net/).

气道转录组分析或可揭示与哮喘临床表型相关的基因表达模式。为明确哮喘转录组亚型(transcriptomic endotypes of asthma, TEA),本研究分析了与疾病表型相关的诱导痰基因表达。研究采用Affymetrix HuGene ST 1.0 微阵列,对哮喘受试者的痰液样本进行基因表达量检测。通过基因无监督聚类分析,得到了TEA聚类亚型,并对临床特征进行了比较分析。本数据集的详细信息可访问:http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE56396。本数据集包含基因表达数据、待预测的结局变量(分类标签)以及所有临床协变量。当基因表达数据经多批次处理时,本数据集已附带批次信息。所有数据集均以文件组形式组织,每个文件组包含对应独立数据集的全部相关文件。基因表达文件已通过Bioconductor的SCAN.UPC软件包,采用SCAN与UPC两种方法完成标准化处理(相关软件包链接:https://www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html)。本研究借助BrainArray数据库(http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/20.0.0/ensg.asp)完成基因水平的数据汇总,并采用Ensembl基因标识符进行标注。分类标签、临床数据与批次数据均经过人工整理以确保一致性,采用"整洁数据(tidy data)"格式规范。此外,所有数据文件均已适配ML-Flex机器学习软件包的导入格式(相关软件包链接:http://mlflex.sourceforge.net/)。
创建时间:
2023-06-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作