当前位置:网站首页>[reinforcement learning notes] common symbols in reinforcement learning

[reinforcement learning notes] common symbols in reinforcement learning

2022-06-25 08:35:00 Allenpandas

Symbol Symbol interpretation
≐ \doteq Defining symbols
≈ \approx About equal to
ϵ \epsilon ϵ ϵ \epsilon ϵ The probability of random action in a greedy strategy
γ \gamma γ Discount factor
λ \lambda λ Decay rate in trace
← \leftarrow Assignment symbol
s s s, s ′ s' s state s s s
a a a action a a a
r r r earnings r r r
t t t Discrete time steps , Or time
π \pi π Strategy ( Decision making rules )
π ( s ) \pi(s) π(s) according to Deterministic strategies π \pi π In state s s s Action selected when
π ( a ∣ s ) \pi(a|s) π(as) according to Random strategy π \pi π In state s s s Action selected when a a a Probability
A t A_{t} At t t t The action of the moment
S t S_{t} St t t t The state of the moment , Usually by S t − 1 S_{t-1} St1 and A t − 1 A_{t-1} At1 Random decision
R t R_{t} Rt t t t The benefits of the moment , Usually by S t − 1 S_{t-1} St1 and A t − 1 A_{t-1} At1 Random decision
G t G_t Gt t t t The reward of the moment ( It's an expectation )
p ( s ′ , r ∣ s , a ) p(s', r |s, a) p(s,rs,a) From the State s s s Take action a a a Move to state s ′ s' s And get the benefits r r r Probability
p ( s ′ ∣ s , a ) p(s' |s, a) p(ss,a) From the State s s s Take action a a a Move to state s ′ s' s Probability
r ( s , a ) r(s, a) r(s,a) From the State s s s Take action a a a The expectation of immediate benefits
r ( s , a , s ′ ) r(s, a, s') r(s,a,s) From the State s s s Take action a a a Move to state s ′ s' s The expectation of immediate benefits
v π ( s ) v_\pi(s) vπ(s) state s s s In the strategy π \pi π Under the value of ( Expected return )
v ∗ ( s ) v_*(s) v(s) state s s s The value under the optimal strategy
q π ( s , a ) q_\pi(s, a) qπ(s,a) state s s s In the strategy π \pi π Take action a a a The value of
q ∗ ( s , a ) q_*(s, a) q(s,a) state s s s Take action under the optimal strategy a a a The value of
V V V, V t V_{t} Vt State value function
Q Q Q, Q t Q_{t} Qt Action value function
原网站

版权声明
本文为[Allenpandas]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/176/202206250713199080.html