当前位置:网站首页>Concordia University | volume product cycle network for reward generation in reinforcement learning

Concordia University | volume product cycle network for reward generation in reinforcement learning

2022-06-22 19:54:00 Zhiyuan community

【 title 】Graph Convolutional Recurrent Networks for Reward Shaping in Reinforcement Learning

【 The author team 】Hani Sami, Jamal Bentahar, Azzam Mourad

【 Date of publication 】2022.6.18

【 Thesis link 】https://www.sciencedirect.com/science/article/pii/S0020025522006442

【 Recommended reasons 】 In this paper , The author considers reinforcement learning (RL) Low speed convergence in , A new reward generation scheme is proposed , It is a combination of (1) Graph convolution cycle network (GCRN)、(2) Enhanced Krylov and (3) Look ahead to suggestions to form potential functions . The author puts forward a kind of GCRN framework , It combines graph convolution networks (GCN) To capture spatial dependencies and bi-directional gated loop units (Bi-GRU) To solve the time dependency . The author of GCRN The definition of loss function is combined with hidden Markov model (HMM) Message passing technology of . Because the transfer matrix of the environment is difficult to calculate , Use Krylov Base to estimate the transition matrix , Its performance is better than the existing approximate basis . Unlike existing potential functions that rely solely on state to perform reward shaping , The author uses both States and actions to produce more accurate suggestions through the prospective suggestion mechanism . Various tests show that , The solution in this paper is superior to the most advanced solution in terms of learning speed , Get higher rewards at the same time .

 

原网站

版权声明
本文为[Zhiyuan community]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/173/202206221832178363.html