Systems Neurobiology Group

Based on the computational theories of reinforcement learning and Bayesian inference, we explore how such computations are realized in the brain through experiments with animal and human behaviors, pharmacological and optogenetic manipulations, and neural recordings.

Neuromodulators in reinforcement learning

The reinforcement learning theory provides the method of learning based on reward maximization. However, the some parameters in the algorithms should be tuned by hands. On the other hand, the brain seems to have a certain mechanism to tune these parameters, because humans and animals can learn novel behaviors under a wide variety of environments. We hypothesized that neuromodulatory systems control these parameters. More specifically, we proposed the following set of hypothesis (Doya, 2002):

Dopamine signals reward prediction error.
Serotonin controls the time scale of prediction of future rewards.
Noradrenaline controls the width of exploration.
Acetylcholine controls the rate of memory updates.

Role of serotonin in actions for delayed rewards

Katsuhiko Miyazaki, Kayoko W Miyazaki, Hiroaki Hamada, Masakazu Taira, Miles Desforges, Yuma Kajihara, Jianning Chen

Collaborators: Kenji Tanaka (Keio U), Bernd Kuhn (OIST), Kazumasa Tanaka (OIST)

While serotonin is well known to be involved in a variety of psychiatric disorders including depression, schizophrenia, autism, and impulsivity, its role in the normal brain has been far from clear. From the viewpoint of reinforcement learning, we earlier proposed that an important role of serotonin is to regulate the temporal discounting parameter that controls the weights for delayed rewards (Doya, 2002). We previously revealed that serotonergic neurons in the dorsal raphe nucleus (DRN) increase activity when rats waited for delayed rewards (Miyazaki KW et al., 2011; Miyazaki et al., 2011). We also showed that pharmacological suppression of DRN serotonin neurons impairs waiting (Miyazaki KW et al., 2012), and that optogenetic activation of DRN serotonergic neurons prolong the time until abandoning for delayed reward (Miyazaki KW et al., 2014) and that the effect is context dependent on the certainty of reward delivery and the uncertainty of reward timing (Miyazaki et al., 2018).

We further performed optogenetic stimulation of serotonergic axon terminal in the orbitofrontal cortex (OFC), medial prefrontal cortex (mPFC) and the nucleus accumbens (NAc). While OFC terminal stimulation was nearly as effective as DRN stimulation, mPFC terminal stimulation was effective only when the reward timing was uncertain, showing their differential contribution in enhancing waiting (Miyazaki et al., 2020). Based on these findings, we proposed a new theoretical framework that serotonin may signal availability of time and resources, which can affect many aspects of action and learning (Doya et al., 2021).

A reinforcement learning in the basal ganglia circuit

Dopamine neurons in the midbrain fire when the amount of actual reward is larger than animal's expectation of reward. That is, dopamine neurons can be considered to code reward prediction error. The striatum, which is the input site of the basal ganglia, receives the dopaminergic input and as well as input from the cortex. In the reinforcement learning algorithm, the reward prediction error has an essential role to learn the optimal behavior. Therefore, we hypothesized that a reinforcement learning algorithm is implemented in the basal ganglia circuit (Doya, 2000, 2002).

Dual cortical circuits for Bayesian inference and reinforcement learning

Sergey Zobbin, Naohiro Yamauchi, Kota Shirahata

Collaborators: Bernd Kuhn (OIST)

In the neocortex, while the posterior half is mostly involved in sensory perception and the frontal half is mostly involved in actions, the common six-layer architecture is preserved, know as the canonical cortical circuit (Douglas and Martin, 2017). Based on the duality of dynamics Bayesian inference and optimal control, we hypothesized that the sensory and motor cortices implement these computations in similar neural circuits (Doya, 2021).

To test a hypothesis about how cortical circuits implement Bayesian inference (Bastos et al., 2012), we performed cross-layer calcium imaging while mice performed a lever pulling task with variable lever resistance. Regression analysis showed that more deep layer neurons encode expected lever resistance, while more superficial layer neurons encode actual lever resistance (Zobnin, PhD thesis, 2024). This is a first step toward understanding the cortical implementation of Bayesian inference.