当前位置：网站首页>stable_ Baselines quick start

stable_ Baselines quick start

2022-07-25 13:20:00 【Hurry up and don't let it rot】

0 brief introduction

baselines yes OpenAI Launched a set of reinforcement learning algorithm components , Used to quickly configure reinforcement learning algorithm , It is friendly to beginners

1 install

pip install stable-baselines

2 Parameter Introduction

Base RL Class

common interface for all the RL algorithms

class stable_baselines.common.base_class.BaseRLModel(policy,env,verbos=0,*,requires_vec_env,policy_base,policy_kwargs=None,seed=None,n_cpu_tf_sess=None)

The base RL model
Parameters:

      policy - ( BasePolicy )Policy object
      policy:  Strategy model selection , Used to establish status / state - The connection between action pairs and strategies , The bottom layer is multilayer perceptron or convolutional network .
      env:   [Gym environment] The environment to learn from [if registered in Gym, can be str. Can be None for loading trained models]
      env：
        The necessary methods ：step(action)、reset()、render()
        Necessary elements ：action_space、observation_space

 step(action): Simulation step , How to accept a action Then carry out one-step simulation 
 reset(): Reset 
 render(): Show 
 action_space: Successive / discrete . For example, discrete , East, West, North and south . Like continuous , Select an interval to produce a number , As one of his motion steps .
 observation_space It's the same , For example, robot right 6 A joint , His state is based on his 6 Positions and 6 A speed to express . This speed and position have an upper and lower range . This range can be used as his observation_space

Meet the above 5 If there are three elements , This environment can be transmitted to stable-baselines It's time for the next training

application

adopt stable_baselines establish DQN frame , Train and run the inverted pendulum （CartPole-v0）

from stable_baselines import DQN
from stable_baselines.common.evaluation import evaluate_policy
import gym
import time

env = gym.make('CartPole-v0')  #  Incoming inverted pendulum 
TRAIN = 0

if TRAIN: #  The training part 
    model = DQN('MlpPolicy', env, learning_rate=1e-3, prioritized_replay=True, verbose=1) #  It belongs to such a network that can be discrete 
    # MlpPolicy, A strategy of multilayer perceptron or neural network 
    # env, Incoming environment 
    #  Some other parameters , Look under this folder , I don't want to elaborate , Each one has a detailed explanation 
    model.learn(total_timesteps=int(1e5)) #  Start training , Direct use model.learn That's all right. , This learn Some parameters will also be involved in 
    model.save("dqn_cartpole") # After training , You can save such a model  
    del model # After training , This model is useless , You can delete it 

else: #  Part of the presentation 
    model = DQN.load("dqn_cartpole", env) #  Call the trained model , Call from neural network 
    mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10)
    obs = env.reset() #  Status reset 
    for i in range(1000):
        action, _states = model.predict(obs) #  Pass the current status into , Tell us what actions we will make .
        obs, rewards, done, info = env.step(action)#  Return to a new state 
        env.render() #  Make a display 
        time.sleep(2) # for showing render()