five

Adaptive Design and Analysis Via Partitioning Trees for Emulation of a Complex Computer Code

收藏
Mendeley Data2024-06-29 更新2024-06-29 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Adaptive_Design_and_Analysis_Via_Partitioning_Trees_for_Emulation_of_a_Complex_Computer_Code/19172445
下载链接
链接失效反馈
官方服务:
资源简介:
Computer models are used as replacements for physical experiments in a large variety of applications. Nevertheless, direct use of the computer model for the ultimate scientific objective is often limited by the complexity and cost of the model. Gaussian process regression has been the almost ubiquitous choice for a fast statistical emulator for such a computer model, due to its flexible form and analytical expressions for measures of predictive uncertainty. However, even this statistical emulator can be computationally intractable for large designs, due to computing time increasing with the cube of the design size. Multiple methods have been proposed for addressing this problem. We discuss several of them, and compare their predictive and computational performance in several scenarios. We propose solving this problem using a new method, adaptive design and analysis via partitioning trees (ADAPT). The new approach is motivated by the idea that most computer models are only complex in particular regions of the input space. By taking a data-adaptive approach to the development of a design, and choosing to partition the space in the regions of highest variability, we obtain a higher density of points in these regions and hence accurate prediction. Supplemental files for this article are available online.

在诸多应用领域中,计算机模型常被用作物理实验的替代方案。然而,受限于模型自身的复杂性与运行成本,直接将其用于达成最终科研目标往往受到限制。鉴于其灵活的形式结构与可用于量化预测不确定性的解析表达式,高斯过程回归(Gaussian process regression)已成为此类计算机模型快速统计模拟器的近乎通用的选择。不过,即便此类统计模拟器,在面对大规模实验设计时也会面临计算不可行的问题——其计算耗时随实验设计规模的三次方增长。目前已有多种方法被提出以解决该问题,本文将对其中数种方法展开讨论,并在多种场景下对比它们的预测性能与计算性能。本文提出采用一种全新方法——基于划分树的自适应设计与分析方法(adaptive design and analysis via partitioning trees,缩写ADAPT)——来解决上述问题。该新方法的设计灵感源于如下认知:多数计算机模型仅在输入空间的特定区域内才会呈现出较高的复杂性。通过采用数据自适应的方式构建实验设计,并选择在变异性最高的区域对输入空间进行划分,我们可在这些区域获得更高的采样点密度,从而实现精准的预测效果。本文的补充材料可在线获取。
创建时间:
2023-06-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作