Pitfalls in quantifying exploration in reward-based motor learning: simulated datasets
收藏DataCite Commons2025-07-02 更新2025-04-09 收录
下载链接:
https://dataverse.nl/citation?persistentId=doi:10.34894/ANJOPR
下载链接
链接失效反馈官方服务:
资源简介:
When learning a movement based on binary success information, one is more variable
following failure than following success. Theoretically, the additional variability post
failure might reflect exploration of possibilities to obtain success. When average
behavior is changing (as in learning), variability can be estimated from differences
between subsequent movements. Can one estimate exploration reliably from such
trial-to-trial changes when studying reward-based motor learning? To answer this
question, we tried to reconstruct the exploration underlying learning as described by
four existing reward-based motor learning models. We simulated learning for various
learner and task characteristics. If we simply determined the additional change post
failure, estimates of exploration were sensitive to learner and task characteristics. We
identified two pitfalls in quantifying exploration based on trial-to-trial changes. Firstly,
performance-dependent feedback can cause correlated samples of motor noise and
exploration on successful trials, which biases exploration estimates. Secondly, the trial
relative to which trial-to-trial change is calculated may also contain exploration, which
causes underestimation. As a solution, we developed the additional trial-to-trial change
(ATTC) method. By moving the reference trial one trial back and subtracting trial-to-trial
changes following specific sequences of trial outcomes, exploration can be estimated
reliably for the three models that explore based on the outcome of only the previous
trial. Since ATTC estimates are based on a selection of trial sequences, this method
requires many trials. In conclusion, if exploration is a binary function of previous trial
outcome, the ATTC method allows for a model-free quantification of exploration.
基于二元成功信息学习动作时,失败后的动作变异性高于成功后。理论上,失败后的额外变异性可能反映了为获得成功而进行的可能性探索。当平均行为发生变化时(如学习过程中),变异性可通过连续动作间的差异来估计。在研究基于奖励的运动学习时,能否通过这种试次间的变化可靠地估计探索行为?为回答这一问题,我们尝试重构四种现有基于奖励的运动学习模型所描述的学习背后的探索行为。我们针对不同的学习者特征和任务特征进行了学习模拟。若仅简单测定失败后的额外变化,探索行为的估计结果会对学习者特征和任务特征敏感。我们发现了基于试次间变化量化探索行为的两个陷阱:其一,依赖表现的反馈会导致成功试次中运动噪声与探索行为的样本相关,从而使探索行为估计产生偏差;其二,用于计算试次间变化的参考试次本身可能也包含探索行为,导致估计结果被低估。作为解决方案,我们提出了额外试次间变化(ATTC)方法。通过将参考试次后移一个试次,并减去特定试次结果序列后的试次间变化,可对三种仅基于前一试次结果进行探索的模型可靠地估计探索行为。由于ATTC估计基于试次序列的选择,该方法需要大量试次。综上,若探索行为是前一试次结果的二元函数,则ATTC方法可实现探索行为的无模型量化。
提供机构:
DataverseNL
创建时间:
2021-06-24



