five

RNA-seq Titration Results used in plotting for "Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously"

收藏
Mendeley Data2024-01-31 更新2024-06-28 收录
下载链接:
https://figshare.com/articles/dataset/RNA-seq_Titration_Results_used_in_plotting_for_Cross-platform_normalization_enables_machine_learning_model_training_on_microarray_and_RNA-seq_data_simultaneously_/19686453/4
下载链接
链接失效反馈
官方服务:
资源简介:
This data accompanies the manuscript "Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously" by Foltz, Taroni, and Greene. Please refer to our github page. The file contains all data necessary to recreate the main and supplementary figures in our manuscript. Abstract: Large compendia of gene expression data have proven valuable for the discovery of novel biological relationships. Historically, the majority of available RNA assays were run on microarray, while RNA-seq is now the platform of choice for many new experiments. The data structure and distributions between the platforms differ, making it challenging to combine them directly. Here we perform supervised and unsupervised machine learning evaluations to assess which existing normalization methods are best suited for combining microarray and RNA-seq data. We find that quantile and Training Distribution Matching normalization allow for supervised and unsupervised model training on microarray and RNA-seq data simultaneously. Nonparanormal normalization and z-scores are also appropriate for some applications, including pathway analysis with Pathway-Level Information Extractor (PLIER). We demonstrate that it is possible to perform effective cross-platform normalization using existing methods to combine microarray and RNA-seq data for machine learning applications.

本数据集配套Foltz、Taroni与Greene发表的题为《跨平台归一化可实现微阵列与RNA测序数据的联合机器学习模型训练》(Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously)的研究论文,请参阅本团队的GitHub页面。本文件包含复现该论文正文及补充材料中所有主图与辅图所需的全部数据。摘要:大规模基因表达数据合集已被证实可用于发掘全新的生物学关联关系。长期以来,绝大多数公开可用的RNA检测实验均基于微阵列(microarray)平台开展,而如今RNA测序(RNA-seq)已成为多数新实验的首选技术平台。两类平台的数据结构与分布特征存在显著差异,直接合并使用极具挑战性。本研究通过监督与无监督机器学习评估,筛选出最适用于合并微阵列与RNA测序数据的现有归一化方法。研究发现,分位数归一化(quantile normalization)与训练分布匹配归一化(Training Distribution Matching normalization)可同时支持微阵列与RNA测序数据的监督及无监督模型训练。非参数正态归一化(Nonparanormal normalization)与z-score归一化亦适用于部分应用场景,例如通过通路水平信息提取器(Pathway-Level Information Extractor, PLIER)开展的通路分析。本研究证实,借助现有方法可实现有效的跨平台归一化,从而将微阵列与RNA测序数据结合后用于机器学习相关应用。
创建时间:
2024-01-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作