当前位置:网站首页>[reinforcement learning notes] common symbols in reinforcement learning
[reinforcement learning notes] common symbols in reinforcement learning
2022-06-25 08:35:00 【Allenpandas】
Symbol | Symbol interpretation |
---|---|
≐ \doteq ≐ | Defining symbols |
≈ \approx ≈ | About equal to |
ϵ \epsilon ϵ | ϵ \epsilon ϵ The probability of random action in a greedy strategy |
γ \gamma γ | Discount factor |
λ \lambda λ | Decay rate in trace |
← \leftarrow ← | Assignment symbol |
s s s, s ′ s' s′ | state s s s |
a a a | action a a a |
r r r | earnings r r r |
t t t | Discrete time steps , Or time |
π \pi π | Strategy ( Decision making rules ) |
π ( s ) \pi(s) π(s) | according to Deterministic strategies π \pi π In state s s s Action selected when |
π ( a ∣ s ) \pi(a|s) π(a∣s) | according to Random strategy π \pi π In state s s s Action selected when a a a Probability |
A t A_{t} At | t t t The action of the moment |
S t S_{t} St | t t t The state of the moment , Usually by S t − 1 S_{t-1} St−1 and A t − 1 A_{t-1} At−1 Random decision |
R t R_{t} Rt | t t t The benefits of the moment , Usually by S t − 1 S_{t-1} St−1 and A t − 1 A_{t-1} At−1 Random decision |
G t G_t Gt | t t t The reward of the moment ( It's an expectation ) |
p ( s ′ , r ∣ s , a ) p(s', r |s, a) p(s′,r∣s,a) | From the State s s s Take action a a a Move to state s ′ s' s′ And get the benefits r r r Probability |
p ( s ′ ∣ s , a ) p(s' |s, a) p(s′∣s,a) | From the State s s s Take action a a a Move to state s ′ s' s′ Probability |
r ( s , a ) r(s, a) r(s,a) | From the State s s s Take action a a a The expectation of immediate benefits |
r ( s , a , s ′ ) r(s, a, s') r(s,a,s′) | From the State s s s Take action a a a Move to state s ′ s' s′ The expectation of immediate benefits |
v π ( s ) v_\pi(s) vπ(s) | state s s s In the strategy π \pi π Under the value of ( Expected return ) |
v ∗ ( s ) v_*(s) v∗(s) | state s s s The value under the optimal strategy |
q π ( s , a ) q_\pi(s, a) qπ(s,a) | state s s s In the strategy π \pi π Take action a a a The value of |
q ∗ ( s , a ) q_*(s, a) q∗(s,a) | state s s s Take action under the optimal strategy a a a The value of |
V V V, V t V_{t} Vt | State value function |
Q Q Q, Q t Q_{t} Qt | Action value function |
边栏推荐
- Measure the current temperature
- Is it safe to open an account online? Xiaobai asks for guidance
- Scanpy (VII) spatial data analysis based on scanorama integrated scrna seq
- How to interpret the information weight index?
- Is there any risk in the security of new bonds
- Rosparam statement
- How to calculate the characteristic vector, weight value, CI value and other indicators in AHP?
- 4个不可不知的采用“安全左移”的理由
- After using the remote control of the working machine, problems occurred in the use of the local ROS, and the roscore did not respond
- Common SRV types
猜你喜欢
Remove headers from some pages in a word document
How to calculate the characteristic vector, weight value, CI value and other indicators in AHP?
What do various optimizers SGD, adagrad, Adam and lbfgs do?
《树莓派项目实战》第五节 使用Nokia 5110液晶屏显示Hello World
面试前准备好这些,Offer拿到手软,将军不打无准备的仗
C language: count the number of characters, numbers and spaces
How to calculate the correlation coefficient and correlation degree in grey correlation analysis?
What are the indicators of DEA?
VOCALOID notes
Use Adobe Acrobat pro to resize PDF pages
随机推荐
How to calculate the positive and negative ideal solution and the positive and negative ideal distance in TOPSIS method?
故障:Outlook 收发邮件时的 0x800CCC1A 错误
Bluecmsv1.6- code audit
Rosparam statement
EasyPlayer流媒体播放器播放HLS视频,起播速度慢的技术优化
如何实现一个系统调用
[QT] qtcreator shortcut key and QML introduction
Websocket understanding and application scenarios
leetcode.13 --- 罗马数字转整数
关于I/O——内存与CPU与磁盘之间的关系
Wechat applet introduction record
4 reasons for adopting "safe left shift"
股票网上开户安全吗?小白求指导
如何设计测试用例
TCP MIN_ A dialectical study of RTO
Common SRV types
The difference between personal domain name and enterprise domain name
Free SSL certificate acquisition tutorial
What are the indicators of VIKOR compromise?
Is it safe to open an account for stocks on the Internet? Can the securities account be used by others?