当前位置:网站首页>[don't bother with reinforcement learning] video notes (I) 1. What is reinforcement learning?
[don't bother with reinforcement learning] video notes (I) 1. What is reinforcement learning?
2022-07-24 09:16:00 【Your sister Xuan】
【 Don't bother with reinforcement learning videos 】 The notebook
Section 1 What is reinforcement learning ?
We humans are learning , Always know nothing from the beginning , After constant attempts and corrections , The process of finally obtaining the correct solution to the problem , This can be seen as a Strengthen the learning process .
actual , There are many examples of reinforcement learning :
- Alpha-Go A master who defeats human beings on the go field Alpha-Go Baidu Encyclopedia

- Let the computer learn how to play some classic games , Such as Atari game :

These are all for the computer to constantly try and learn the code of conduct , To win the go game or get high scores in the brick game .
How to learn ?
Imagine a virtual teacher teaching computers how to learn , But he can only Rate your behavior . So how to learn through these scores ? It's simple , By remembering high scores 、 Low scores correspond to behaviors , Avoid low marks in learning , Sum up experience in behavior . This feature can be called Score orientation .
further , stay Supervised learning in , We need to get data and labels , however At first there was no data or labels , It's through Interact with the environment again and again Produce behavior , And get the corresponding label , Then learn which data can correspond to which labels , By learning this Law , To get behaviors that can get high scores . As the following example :
- actually , At first, there was a blank table ( It's like Windows Card game table ), There are only two parts: data and labels . Our goal is to try to make some happy expressions , To get a higher score .
- We keep making expressions ( Suppose we don't know what expression is happy ( High marks ) Or sad ( Low score )),“ Virtual teacher ” Will tell us whether your expression is low or high ( That's the label ), In this way, we will get a lot data and label La .
- We get labels by making a lot of expressions and get certain rules from them , After your bitter lesson , You will find that if you do it, you will get high marks , If you do it, you will get a low score .
- In order to get high marks , Will always do .
What algorithms are there for reinforcement learning ?
There are many kinds of reinforcement learning algorithms , for example :
- Choose behavior through value :Q Study 、Sarsa Study ( Both of them are in the form of tables , That's data discrete )、DQN(Deep Q Network Using neural networks )
- Direct selection behavior :Policy Gradients( Policy gradient )
- Imagine the environment and learn from it ( This is really , There is no environment ): Model based reinforcement learning (Model Based RL)
边栏推荐
- The next stop of data visualization platform | gifts from domestic open source data visualization datart "super iron powder"
- DP longest common subsequence detailed version (LCS)
- CUDA day 2: GPU core and Sm core components [easy to understand]
- How should tiktok shop cooperate with live broadcast in the background?
- DSP development, using CCS software to establish engineering and burning
- Using OpenCV to do a simple face recognition
- Tiktok live broadcast with goods marketing play
- Six pictures show you why TCP shakes three times?
- The difference between & &, | and |
- Ansible 常用模块介绍
猜你喜欢

Tang Yudi opencv background modeling

Houdini notes

TiFlash 源码阅读(五) DeltaTree 存储引擎设计及实现分析 - Part 2

Tiktok's "online celebrity" was poached by Amazon and broadcast on Amazon live platform

VGA character display based on FPGA

Re6: reading paper licin: a heterogeneous graph based approach for automatic legal stat identification fro

The difference between & &, | and |

【我的创作一周年纪念日】爱情是需要被纪念的,创作也是

Super complete summary: how to operate files in go language

Getting started with web security - open source firewall pfsense installation configuration
随机推荐
Data center: started in Alibaba and started in Daas
Es document CRUD
DSP development, using CCS software to establish engineering and burning
How do tiktok merchants bind the accounts of talents?
Seven data show the impact of tiktok's combination of payment and organic content
科目1-2
UE5影视动画渲染MRQ分层学习笔记
Ansible 常用模块介绍
Rocky basics shell script Basics
Leetcode94 detailed explanation of middle order traversal of binary tree
Android系统安全 — 5.3-APK V2签名介绍
代码随想录笔记_链表_25K个一组翻转链表
The next stop of data visualization platform | gifts from domestic open source data visualization datart "super iron powder"
Replace the function of pow with two-dimensional array (solve the time overrun caused by POW)
Un7.22: how to upload videos and pictures simultaneously with the ruoyi framework in idea and vs Code?
Advantages of using partitions
How to configure env.js in multiple environments in uni app
[Luogu p5829] [template] mismatch tree (string) (KMP)
Six pictures show you why TCP shakes three times?
Developing ebpf program with go language