当前位置:网站首页>Concordia University | volume product cycle network for reward generation in reinforcement learning
Concordia University | volume product cycle network for reward generation in reinforcement learning
2022-06-22 19:54:00 【Zhiyuan community】
【 title 】Graph Convolutional Recurrent Networks for Reward Shaping in Reinforcement Learning
【 The author team 】Hani Sami, Jamal Bentahar, Azzam Mourad
【 Date of publication 】2022.6.18
【 Thesis link 】https://www.sciencedirect.com/science/article/pii/S0020025522006442
【 Recommended reasons 】 In this paper , The author considers reinforcement learning (RL) Low speed convergence in , A new reward generation scheme is proposed , It is a combination of (1) Graph convolution cycle network (GCRN)、(2) Enhanced Krylov and (3) Look ahead to suggestions to form potential functions . The author puts forward a kind of GCRN framework , It combines graph convolution networks (GCN) To capture spatial dependencies and bi-directional gated loop units (Bi-GRU) To solve the time dependency . The author of GCRN The definition of loss function is combined with hidden Markov model (HMM) Message passing technology of . Because the transfer matrix of the environment is difficult to calculate , Use Krylov Base to estimate the transition matrix , Its performance is better than the existing approximate basis . Unlike existing potential functions that rely solely on state to perform reward shaping , The author uses both States and actions to produce more accurate suggestions through the prospective suggestion mechanism . Various tests show that , The solution in this paper is superior to the most advanced solution in terms of learning speed , Get higher rewards at the same time .
边栏推荐
- 树、森林及二叉树的相互转换
- 51万奖池邀你参战!第二届阿里云ECS CloudBuild开发者大赛来袭
- Geometrical product specifications (GPS) - ISO code system for linear dimensional tolerances
- Upgrade VS2008 crystal report to the corresponding version of vs2013
- 1.4-----PCB设计?(电路设计)确定方案
- How to use yincan IS903 to master DIY's own USB flash disk? (good items for practicing BGA welding)
- 拓扑排序
- 【深入理解TcaplusDB技術】TcaplusDB運維——日常巡檢
- 【深入理解TcaplusDB技术】TcaplusDB运维单据
- 卡尔加里大学|基于强化学习的推荐系统综述
猜你喜欢

Solution de pin hors grille dans altium designer

1.3----- simple setting of 3D slicing software

Yarn notes

NAND闪存(NAND Flash)颗粒SLC,MLC,TLC,QLC的对比

详解openGauss多线程架构启动过程

NRF51822外设学习

K8s deploy MySQL

Follow up course supplement of little turtle teacher "take you to learn C and take you to fly"

Comparison of NAND flash particles SLC, MLC, TLC and QLC

Some problem records of openpnp using process
随机推荐
Assign values to objects
二叉排序树的查找、插入和删除
MySQL数据库DQL练习题
How to use yincan IS903 to master DIY's own USB flash disk? (good items for practicing BGA welding)
再谈SQL profile : 到底能不能固定执行计划?
图的存储结构(邻接矩阵)
第一章 力扣热题100道(1-5)
AttributeError: ‘KeyedVectors‘ object has no attribute ‘wv‘
卡尔加里大学|基于强化学习的推荐系统综述
金鱼哥RHCA回忆录:DO447管理用户和团队的访问--创建和管理Ansible Tower用户
1.3-----Simplify 3D切片软件简单设置
康考迪亚大学|图卷积循环网络用于强化学习中的奖励生成
冒泡排序、选择排序、直接插入排序
libcef最新下载地址-在VS2015下编译为MD-动态链接
谷歌| ICML 2022: 深度强化学习中的稀疏训练状态
Calendar control programming
Openpnp使用过程的一些问题记录
使用 Order by 与 rownum SQL 优化案例一则
Velocity syntax
北京大学|通过对比学习实现离线元强化学习的鲁棒任务表示