five

RNA-seq Titration Results used in plotting for "Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously"

收藏
DataCite Commons2023-03-20 更新2024-07-29 收录
下载链接:
https://figshare.com/articles/dataset/RNA-seq_Titration_Results_used_in_plotting_for_Cross-platform_normalization_enables_machine_learning_model_training_on_microarray_and_RNA-seq_data_simultaneously_/19686453/4
下载链接
链接失效反馈
官方服务:
资源简介:
This data accompanies the manuscript "Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously" by Foltz, Taroni, and Greene. Please refer to our github page. <br> The file contains all data necessary to recreate the main and supplementary figures in our manuscript. <br> Abstract: Large compendia of gene expression data have proven valuable for the discovery of novel biological relationships. Historically, the majority of available RNA assays were run on microarray, while RNA-seq is now the platform of choice for many new experiments. The data structure and distributions between the platforms differ, making it challenging to combine them directly. Here we perform supervised and unsupervised machine learning evaluations to assess which existing normalization methods are best suited for combining microarray and RNA-seq data. We find that quantile and Training Distribution Matching normalization allow for supervised and unsupervised model training on microarray and RNA-seq data simultaneously. Nonparanormal normalization and z-scores are also appropriate for some applications, including pathway analysis with Pathway-Level Information Extractor (PLIER). We demonstrate that it is possible to perform effective cross-platform normalization using existing methods to combine microarray and RNA-seq data for machine learning applications.

本数据集配套Foltz、Taroni与Greene所著题为《跨平台归一化可实现微阵列与RNA测序(RNA-seq)数据的同步机器学习模型训练》的手稿,请参阅我们的GitHub页面。 本文件包含复现手稿中主图与补充图表所需的全部数据。 摘要:大型基因表达数据集汇编已被证实对发现全新的生物学关联具有重要价值。历史上,绝大多数可用的RNA检测均基于微阵列(microarray)平台开展,而如今RNA测序(RNA-seq)已成为多数新实验的首选平台。两类平台的数据结构与分布存在差异,直接进行合并极具挑战性。本研究通过监督与非监督机器学习评估,旨在筛选最适用于合并微阵列与RNA测序数据的现有归一化方法。研究发现,分位数归一化与训练分布匹配(Training Distribution Matching)归一化可实现基于微阵列与RNA测序数据的同步监督与非监督模型训练。非正态归一化(Nonparanormal normalization)与Z分数亦适用于部分应用场景,包括借助通路水平信息提取器(Pathway-Level Information Extractor,PLIER)开展的通路分析。本研究证实,可通过现有方法实现有效的跨平台归一化,从而合并微阵列与RNA测序数据以用于机器学习相关应用。
提供机构:
figshare
创建时间:
2022-12-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作