five

Modeling Persistent Trends in Distributions

收藏
DataCite Commons2020-09-01 更新2024-07-25 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Modeling_Persistent_Trends_in_Distributions/5190550
下载链接
链接失效反馈
官方服务:
资源简介:
We present a nonparametric framework to model a short sequence of probability distributions that vary both due to underlying effects of sequential progression and confounding noise. To distinguish between these two types of variation and estimate the sequential-progression effects, our approach leverages an assumption that these effects follow a persistent trend. This work is motivated by the recent rise of single-cell RNA-sequencing experiments over a brief time course, which aim to identify genes relevant to the progression of a particular biological process across diverse cell populations. While classical statistical tools focus on scalar-response regression or order-agnostic differences between distributions, it is desirable in this setting to consider both the full distributions as well as the structure imposed by their ordering. We introduce a new regression model for ordinal covariates where responses are univariate distributions and the underlying relationship reflects consistent changes in the distributions over increasing levels of the covariate. This concept is formalized as a <i>trend</i> in distributions, which we define as an evolution that is linear under the Wasserstein metric. Implemented via a fast alternating projections algorithm, our method exhibits numerous strengths in simulations and analyses of single-cell gene expression data. Supplementary materials for this article are available online.

本研究提出一种非参数框架,用于建模一组短时序概率分布序列——此类分布的变异同时源于序列演进的内在效应与混杂噪声。为区分这两类变异并估算序列演进效应,本方法基于「此类效应遵循持续趋势」的假设展开。本研究的动机源于近期短时序单细胞RNA测序(single-cell RNA-sequencing)实验的兴起,此类实验旨在识别与不同细胞群体中特定生物过程演进相关的基因。尽管经典统计工具多聚焦于标量响应回归或分布间与顺序无关的差异,但在此类场景中,同时考量完整分布及其顺序所赋予的结构更为合理。我们提出一种针对有序协变量的新型回归模型:其响应为单变量分布,且内在关系反映了协变量水平递增时分布的持续变化。该概念被形式化为分布中的「趋势(trend)」,我们将其定义为在瓦瑟斯坦度量(Wasserstein metric)下呈线性的演化过程。本方法通过快速交替投影算法实现,在单细胞基因表达数据的模拟与实际分析中展现出诸多优势。本文补充材料可在线获取。
提供机构:
Taylor & Francis
创建时间:
2017-07-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作