当前位置:网站首页>[don't bother with intensive learning] video notes (III) 1. What is SARS?
[don't bother with intensive learning] video notes (III) 1. What is SARS?
2022-07-24 09:17:00 【Your sister Xuan】
The first 7 section What is? SARSA?
SARSA Is with the Q Learn similar algorithms , About Q The introduction of learning is in the previous notes , Here is mainly about :
【 Don't bother to strengthen learning 】 Video notes ( Two )1. What is? Q-Learning?
And Q To study the same ,SARSA Learning also uses theft “Q surface ”, By updating the Q Watch to learn .
As shown in the figure above ,SARSA The update of learning also has two parts : reality Q Value and It is estimated that Q value . It is estimated that Q The value is directly from Q Selected in the table , But reality Q Value estimation method and Q Learning is different .
First , We have a sequence S、A、R、S‘、A’, When estimating the true value, you need to choose S‘ Your next move A’, And action A‘ Is not a choice Q The largest value in the table , It's a choice What is really going to happen that , That is, actions with certain randomness . Others are the same as Q Learn the same , Wait, then use the difference between the estimated value and the actual value to update the original Q surface .
And Q Learn to understand differences
Q The learning method is Off-Policy, Different strategies , It's about updating and sampling Q Values are different . and SARSA Learning is Same strategy Of (On-Policy), All use ϵ \epsilon ϵ- greedy ( Generally speaking ), With stronger randomness . Here are Q Study ( above ) and SARSA Study ( below ) The pseudo code :

It can be seen that , The two are very different in the updated part ,Q Learning and SARSA The learning process is described as follows :
- ϵ \epsilon ϵ-Greedy Get status s s s Next corresponding action a a a → \rightarrow → Interacting with the environment pays off R R R And the next state s ′ s' s′ → \rightarrow → Direct selection Q The most valuable Q ( s ′ , a ′ ) Q(s', a') Q(s′,a′) Estimate the true value → \rightarrow → Update parameters → \rightarrow → Move to the next state
- The previous step ϵ \epsilon ϵ-Greedy Get the action a a a → \rightarrow → Interacting with the environment pays off R R R And the next state s ′ s' s′ → \rightarrow → adopt ϵ \epsilon ϵ-Greedy obtain s ′ s' s′ Your next move a ′ a' a′ → \rightarrow → Use Q ( s ′ , a ′ ) Q(s',a') Q(s′,a′) Update parameters → \rightarrow → Move to the next state and action
Q Use the maximum when learning and updating Q A worthy action , and SARSA Learning directly used The next time ϵ \epsilon ϵ-Greedy The real action of sampling , obviously SARSA Learn to use real values , and Q Learning to use Greedy estimation Of “ True value ”.Q Learning and SARSA The difference in learning also reveals On-Policy And Off-Policy Similarities and differences .(Q Learning for Off-Policy Different strategies ,SARSA Learning for On-Policy Same strategy )
Last one :【 Don't bother to strengthen learning 】 Video notes ( Two )3.Q_Learning The algorithm realizes maze walking
Next :【 Don't bother to strengthen learning 】 Video notes ( 3、 ... and )2.SARSA Learn to walk the maze
边栏推荐
- Will your NFT disappear? Dfinity provides the best solution for NFT storage
- web安全入门-开源防火墙Pfsense安装配置
- Office fallback version, from 2021 to 2019
- How to open the port number of the server, and the corresponding port of common network services
- JUC powerful auxiliary class
- [Luogu p3426] SZA template (string) (KMP)
- How do tiktok merchants bind the accounts of talents?
- Interviewer: man, how much do you know about the read-write lock of go language?
- UE5影视动画渲染MRQ分层学习笔记
- [FFH] openharmony gnawing paper growth plan -- Application of cjson in traditional c/s model
猜你喜欢

03_ UE4 advanced_ illumination

Unity solves the package manager "you see to be offline"

FreeRTOS - use of software timer

Configuration of uni app page.json title bar

Android system security - 5.2-apk V1 signature introduction

web安全入门-开源防火墙Pfsense安装配置

How to judge and analyze NFT market briefly through NFT go?

排序入门—插入排序和希尔排序

【我的创作一周年纪念日】爱情是需要被纪念的,创作也是

Leetcode102-二叉树的层序遍历详解
随机推荐
Wenxin big model raises a new "sail", and the tide of industrial application has arrived
Ansible 常用模块介绍
分类与回归的区别
唐宇迪opencv-背景建模
(5) Cloud integrated gateway gateway +swagger documentation tool
Ue5 film and television animation rendering MRQ layered learning notes
Data center: started in Alibaba and started in Daas
How to configure env.js in multiple environments in uni app
[translation] integration challenges in microservice architecture using grpc and rest
Promise basic summary
链表——19. 删除链表的倒数第 N 个结点
JS locate Daquan to get the brother, parent and child elements of the node, including robot instances
Why does TCP shake hands three times instead of two times (positive version)
【基于ROS的URDF练习实例】四轮机器人与摄像头的使用
Little dolphin "transformed" into a new intelligent scheduling engine, which can be explained in simple terms in the practical development and application of DDS
Vector control of permanent magnet synchronous motor (I) -- mathematical model
代码随想录笔记_链表_25K个一组翻转链表
Getting started with web security - open source firewall pfsense installation configuration
数据中台:始于阿里,兴于DaaS
Guys, what parameters can be set when printing flinksql so that the values can be printed? This later section is omitted. It's inconvenient. I read the configuration on the official website