five

Data Fission: Splitting a Single Data Point

收藏
DataCite Commons2023-12-14 更新2024-08-26 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Data_fission_splitting_a_single_data_point/24328745
下载链接
链接失效反馈
官方服务:
资源简介:
Suppose we observe a random vector <i>X</i> from some distribution in a known family with unknown parameters. We ask the following question: when is it possible to split <i>X</i> into two pieces <i>f</i>(<i>X</i>) and <i>g</i>(<i>X</i>) such that neither part is sufficient to reconstruct X by itself, but both together can recover X fully, and their joint distribution is tractable? One common solution to this problem when multiple samples of X are observed is data splitting, but Rasines and Young offers an alternative approach that uses additive Gaussian noise—this enables post-selection inference in finite samples for Gaussian distributed data and asymptotically when errors are non-Gaussian. In this article, we offer a more general methodology for achieving such a split in finite samples by borrowing ideas from Bayesian inference to yield a (frequentist) solution that can be viewed as a continuous analog of data splitting. We call our method data fission, as an alternative to data splitting, data carving and <i>p</i>-value masking. We exemplify the method on several prototypical applications, such as post-selection inference for trend filtering and other regression problems, and effect size estimation after interactive multiple testing. Supplementary materials for this article are available online.

假设我们从某个含未知参数的已知分布族中观测到一个随机向量X,提出如下问题:能否将X拆分为f(X)与g(X)两部分,使得任一单一部分均无法单独重构X,但二者结合可完全还原X,且二者的联合分布易于处理?当可获取X的多个样本时,解决该问题的常见方案是数据拆分(data splitting),但拉西内斯(Rasines)与杨(Young)提出了一种使用加性高斯噪声的替代方法——该方法可在有限样本下针对高斯分布数据开展选择后推断,且在误差非高斯时也能实现渐近推断。本文提出一种更具普适性的方法,可在有限样本下实现此类拆分:我们借鉴贝叶斯推断(Bayesian inference)的思想,得到了可视为数据拆分连续类比形式的(频率学派)解法。我们将该方法命名为数据裂变(data fission),作为数据拆分、数据雕琢(data carving)与p值掩码(p-value masking)的替代方案。我们通过多个典型应用场景对该方法进行演示,例如趋势滤波与其他回归问题的选择后推断,以及交互式多重检验后的效应量估计。本文的补充材料可在线获取。
提供机构:
Taylor & Francis
创建时间:
2023-10-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作