five

Controlled Discovery and Localization of Signals via Bayesian Linear Programming

收藏
DataCite Commons2024-06-11 更新2024-08-19 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Controlled_Discovery_and_Localization_of_Signals_via_Bayesian_Linear_Programming/25712750
下载链接
链接失效反馈
官方服务:
资源简介:
Scientists often must simultaneously localize and discover signals. For instance, in genetic fine-mapping, high correlations between nearby genetic variants make it hard to identify the exact locations of causal variants. So the statistical task is to output as many disjoint regions containing a signal as possible, each as small as possible, while controlling false positives. Similar problems arise, for example, when locating stars in astronomical surveys and in changepoint detection. Common Bayesian approaches to these problems involve computing a posterior distribution over signal locations. However, existing procedures to translate these posteriors into credible regions for the signals fail to capture all the information in the posterior, leading to lower power and (sometimes) inflated false discoveries. We introduce Bayesian Linear Programming (BLiP), which can efficiently convert any posterior distribution over signals into credible regions for signals. BLiP overcomes an extremely high-dimensional and nonconvex problem to verifiably nearly maximize expected power while controlling false positives. Applying BLiP to existing state-of-the-art analyses of UK Biobank data (for genetic fine-mapping) and the Sloan Digital Sky Survey (for astronomical point source detection) increased power by 30%–120% in just a few minutes of additional computation. BLiP is implemented in pyblip (Python) and blipr (R). Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.

科学家往往需要同时完成信号定位与信号发现两项任务。例如在遗传精细定位(genetic fine-mapping)中,邻近遗传变异间的高度相关性会使得精准识别致病变异的确切位置变得困难。因此该统计任务的目标为:在控制假阳性率的前提下,输出尽可能多的互不重叠的信号区域,且每个区域的尺寸尽可能小。类似的问题同样存在于天文巡天中的恒星定位以及变点检测(changepoint detection)场景中。针对这类问题的主流贝叶斯方法通常需要计算信号位置的后验分布。然而,现有将此类后验分布转换为信号可信区域(credible regions)的流程无法充分利用后验分布中的全部信息,进而导致检验效能较低,且有时会出现假发现率虚高的问题。本文提出了贝叶斯线性规划(Bayesian Linear Programming, BLiP)方法,该方法可高效地将任意信号后验分布转换为信号可信区域。BLiP能够绕过极高维度且非凸的优化难题,在可验证地近似最大化期望检验效能的同时严格控制假阳性率。将BLiP应用于现有英国生物库(UK Biobank)遗传精细定位的前沿分析,以及斯隆数字巡天(Sloan Digital Sky Survey)天文点源检测的相关研究后,仅需额外数分钟的计算时间,检验效能便可提升30%至120%。BLiP已通过pyblip(Python语言实现)与blipr(R语言实现)两个工具包完成开源实现。本文的补充材料可在线获取,其中包含了可复现本研究的标准化材料说明。
提供机构:
Taylor & Francis
创建时间:
2024-04-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作