Pitfalls in quantifying exploration in reward-based motor learning: simulated datasets
收藏Mendeley Data2024-03-27 更新2024-06-27 收录
下载链接:
https://dataverse.nl/citation?persistentId=doi:10.34894/ANJOPR
下载链接
链接失效反馈官方服务:
资源简介:
When learning a movement based on binary success information, one is more variable following failure than following success. Theoretically, the additional variability post failure might reflect exploration of possibilities to obtain success. When average behavior is changing (as in learning), variability can be estimated from differences between subsequent movements. Can one estimate exploration reliably from such trial-to-trial changes when studying reward-based motor learning? To answer this question, we tried to reconstruct the exploration underlying learning as described by four existing reward-based motor learning models. We simulated learning for various learner and task characteristics. If we simply determined the additional change post failure, estimates of exploration were sensitive to learner and task characteristics. We identified two pitfalls in quantifying exploration based on trial-to-trial changes. Firstly, performance-dependent feedback can cause correlated samples of motor noise and exploration on successful trials, which biases exploration estimates. Secondly, the trial relative to which trial-to-trial change is calculated may also contain exploration, which causes underestimation. As a solution, we developed the additional trial-to-trial change (ATTC) method. By moving the reference trial one trial back and subtracting trial-to-trial changes following specific sequences of trial outcomes, exploration can be estimated reliably for the three models that explore based on the outcome of only the previous trial. Since ATTC estimates are based on a selection of trial sequences, this method requires many trials. In conclusion, if exploration is a binary function of previous trial outcome, the ATTC method allows for a model-free quantification of exploration.
当基于二元成功信息(binary success information)学习运动动作时,个体在失败后展现出的动作变异性显著高于成功后。从理论层面而言,失败后产生的额外变异性或许反映了个体为获取成功而对各类可能性展开的探索。在平均行为处于动态变化(如学习过程中)的场景下,可通过后续动作间的差异估算变异性。那么,在开展基于奖励的运动学习研究时,能否通过这类试次间变化(trial-to-trial changes)可靠地估算探索行为?
为解答这一问题,我们尝试依据四种已有的基于奖励的运动学习模型(reward-based motor learning models),重构学习过程背后的探索行为。我们针对不同学习者与任务特征模拟了学习流程。若仅直接判定失败后的额外变化量,那么对探索行为的估算结果会对学习者与任务特征具有较强敏感性。
我们在此类基于试次间变化的探索量化方法中,发现了两处核心缺陷:其一,依赖表现的反馈会在成功试次中引发运动噪声(motor noise)与探索的样本相关性,从而对探索估算结果造成偏差;其二,用于计算试次间变化的参照试次本身也可能包含探索行为,这会导致最终估算值被低估。
针对上述问题,我们提出了附加试次间变化(additional trial-to-trial change, ATTC)法。通过将参照试次后移一个试次,并减去特定试次结果序列对应的试次间变化,我们可针对仅依据前一试次结果进行探索的三类模型,可靠地估算探索行为。由于ATTC估算基于对试次序列的筛选,该方法需要较多的试次样本量。
综上,若探索行为是前一试次结果的二元函数,则ATTC法可实现对探索行为的无模型量化。
创建时间:
2023-06-28



