Data from: Correctness is its own reward: bootstrapping error signals in self-guided reinforcement learning

Name: Data from: Correctness is its own reward: bootstrapping error signals in self-guided reinforcement learning
Creator: Duke Research Data Repository
Published: 2025-12-16 21:51:14
License: 暂无描述

DataCite Commons2025-12-16 更新2026-04-25 收录

下载链接：

https://idn.duke.edu/ark:/87924/r4gf12r3d

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset contains processed Calcium imaging recordings from caudal mesopallium of adult male zebra finches under normal singing, singing with distorted auditory feedback, and singing after bilateral cochlea removal (post-deafening). The basic processing involves z-scoring the raw recordings, song onset time alignment and baseline (0.5 second before song onset) subtraction. The final outputs are trial-averaged activities and standard errors computed over trials. More details about the experimental procedures and data processing are included in the associated manuscript. This dataset is for reproducing the related figures in the manuscript "Correctness is its own reward: bootstrapping error signals in self-guided reinforcement learning" (preprint: https://doi.org/10.1101/2025.07.18.665446). Abstract: Reinforcement learning (RL) offers a compelling account of how agents learn complex behaviors by trial and error, yet RL is predicated on the existence of a reward function provided by the agent's environment. By contrast, many skills are learned without external guidance, posing a challenge to RL's ability to account for self-directed learning. For instance, juvenile male zebra finches first memorize and then train themselves to reproduce the song of an adult male tutor through extensive practice. This process is believed to be guided by an internally computed assessment of performance quality, though the mechanism and development of this signal remain unknown. Here, we propose that, contrary to prevailing assumptions, tutor song memorization and performance assessment are subserved by the same neural circuit, one trained to predictively cancel tutor song. To test this hypothesis, we built models of a local forebrain circuit that uses contextual premotor signals to cancel tutor song auditory input via synaptic plasticity. After learning, excitatory projection neurons signaled mismatches between the tutor song and birds' own performance, best matching experimental data when learning involved anti-Hebbian plasticity in recurrent interneurons. We also found that learning proceeds in two stages, an initial phase of sharpening error sensitivity followed by a fine-tuning period minimizing error responses to the tutor song. Finally, the error signals produced by this model can train a simple RL agent to replicate the spectrograms of adult bird songs. These results suggest that local learning via predictive cancellation suffices for bootstrapping error signals capable of guiding self-directed learning of natural behaviors.

提供机构：

Duke Research Data Repository

创建时间：

2025-12-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集