We develop robust and efficient learning algorithms and test them in simulation and robotic experiments.
Adaptive Systems Group
Reinforcement Learning Algorithms
We are developping model-free and model-based reinforcement algorithms for robust and efficient learning.
Tadashi Kozuno, Paavo Parmas, Dongqi Han
Collaborators: Jun Tani, Remi Munos (DeepMind Paris), Masashi Sugiyama (RIKEN AIP)
Deep neural networks achieved remarkable successes in the computer games, where unlimited amount of data can be sampled by simulation, and language and image generation, where huge sample data can be collected from the internet. However, data-efficiency remains a major issue, especially in application of reinforcement learning in robotic control, where physical interactions in nonstationary environments are required. In collaboration with Dr. Remi Munos at Google DeepMind, we performed theoretical analysis of convergence speed of reinforcement learning algorithms and proposed a novel regularization method that realizes stable and efficient convergence (Viellard et al., 2020).
In collaboration with Prof. Masashi Sugiyama at RIKEN AIP, we developed a unified theory of different ways of computing gradients in probabilistic models (Parmas and Sugiyama, 2021) and proposed a new method, total propagation algorithm (Parmas et al. 2018). Based on the theory, we developed a software tool `Proppo` for easy use of the algorithm (Parmas and Seno, 2022).
We further developed a novel architecture combining model-based and model-free reinforcement learning in the variational Bayesian framework (Han et al., 2024) and demonstrated how goal-directed and habitual actions can help each other.
Origins and Designs of Reward Functions
Farzana Rahman, Yuji Kanagawa, Tojoarisoa Rakotoaritina
Collaborators: Eiji Uchibe (ATR)
In reinforcement learning, how to design an appropriate reward/cost functions remain as an open issue. We previously showed that reward functions to facilitate survival and reproduction can be acquired through embodied evolution in a population of “Cyber Rodent” robots (Elfwing et al. 2011).
While survival and reproduction are fundamental properties required for actively persistent creatures, we appear to have rewards that are not directly linked with survival and reproduction, such as curiosity. Understanding the nature of “intrinsic motivation” in humans and animals and formulating a principle for designing “intrinsic reward” for artificial agents are subjects of active research.
We are now exploring how different types of rewards evolve in different environmental conditions in embodied evolution framework including survival, death, and reproduction based on internal energy levels (Kanagawa & Doya, 2024).
Inverse reinforcement learning (IRL) can be a helpful tool for estimating the reward functions used by human subjects and transferring skills to robots. We proposed entropy-regularized imitation learning (ERIL) that combines forward and inverse reinforcement learning (Uchibe and Doya, 2021).
Cyber Rodent
Based on the theories of reinforcement learning and evolutionary computation, we exlored parallel learning mechanisms using a colony of small rodent-like mobile robots, Cyber Rodents.
The Cyber Rodent robot has an omnidirectional vision system as its eye, infra-red proximity sensors as its whiskers, two wheels for locomotion, and a three-color LED for emotional communication. Especially the Cyber Rodent has the specific capabilities of surviving by recharging from external battery packs and reproduction in software by exchanging genes (programs or data) via an infrared communication ports.
Smartphone Robots
Christopher Buckley
Recent smartphones have high computation performance and various sensors in the small body. We developed two-wheeled robots that can achieve dynamic standup and balancing behaviors (Wang et al. 2017) as well as survival by charging from wireless charging bases and software reproduction by showing QR codes. We are now redesigning the hardware for more reliable operation and efficient energy management.