ncu Annual Reports FY2015 13
Figure 3.3.1: Proposed network architecture for inverse reinforcement learning that consists of three networks: density ratio, reward, and value function. Then the Bellman equation is computed from the outputs of the three networks.
Date:
05 March 2024
Copyright OIST (Okinawa Institute of Science and Technology Graduate University, 沖縄科学技術大学院大学). Creative Commons Attribution 4.0 International License (CC BY 4.0).