When Does Model-Based Control Pay Off?

NIAID Data Ecosystem2026-03-09 收录

下载链接：

https://figshare.com/articles/dataset/When_Does_Model-Based_Control_Pay_Off_/3791652

下载链接

链接失效反馈

官方服务：

资源简介：

Many accounts of decision making and reinforcement learning posit the existence of two distinct systems that control choice: a fast, automatic system and a slow, deliberative system. Recent research formalizes this distinction by mapping these systems to “model-free” and “model-based” strategies in reinforcement learning. Model-free strategies are computationally cheap, but sometimes inaccurate, because action values can be accessed by inspecting a look-up table constructed through trial-and-error. In contrast, model-based strategies compute action values through planning in a causal model of the environment, which is more accurate but also more cognitively demanding. It is assumed that this trade-off between accuracy and computational demand plays an important role in the arbitration between the two strategies, but we show that the hallmark task for dissociating model-free and model-based strategies, as well as several related variants, do not embody such a trade-off. We describe five factors that reduce the effectiveness of the model-based strategy on these tasks by reducing its accuracy in estimating reward outcomes and decreasing the importance of its choices. Based on these observations, we describe a version of the task that formally and empirically obtains an accuracy-demand trade-off between model-free and model-based strategies. Moreover, we show that human participants spontaneously increase their reliance on model-based control on this task, compared to the original paradigm. Our novel task and our computational analyses may prove important in subsequent empirical investigations of how humans balance accuracy and demand.

诸多决策与强化学习（reinforcement learning）相关研究均提出，存在两套独立的决策选择控制系统：一套为快速自动化系统，另一套为缓慢审慎系统。近期研究通过将这两套系统对应至强化学习中的无模型（model-free）与基于模型（model-based）策略，对该区分进行了形式化定义。无模型策略的计算成本较低，但有时精度不足，因为其动作价值（action values）可通过查阅经试错（trial-and-error）构建的查找表（look-up table）直接获取。与之相对，基于模型策略则通过在环境因果模型（causal model）中进行规划来计算动作价值，其精度更高，但认知负荷也更大。学界此前假设，精度与计算需求之间的这种权衡在两套策略的选择仲裁中发挥着重要作用，但我们的研究表明，用于区分无模型与基于模型策略的标志性实验任务，以及若干相关变体任务，均未体现出此类权衡。我们分析了五类因素，这些因素通过降低基于模型策略在奖励结果（reward outcomes）估计上的精度，以及削弱其决策选择的权重，降低了该策略在上述任务中的表现有效性。基于上述观察，我们提出了一种改进版实验任务，该任务在形式与实证层面均实现了无模型与基于模型策略间的精度-需求权衡。此外，相较于原实验范式（paradigm），我们发现人类被试会在该任务中自发提升对基于模型控制策略的依赖程度。我们提出的新型实验任务与计算分析（computational analyses）方法，有望在后续关于人类如何平衡精度与需求的实证研究中发挥重要价值。

创建时间：

2016-08-27

5,000+

优质数据集

54 个

任务类型

进入经典数据集