当前位置：网站首页>Pytoch deep learning code skills

Pytoch deep learning code skills

2022-06-26 14:58:00 【Eight dental shoes】

There are many pitfalls in building a deep learning network , I also see a lot of skills when reading other people's code , Make a unified record , It's also convenient for you to check

Parameter configuration

Argparser library

Argparser Kuo is python A library of its own , Use Argparser Can make us feel like Linux The system uses the command line to set parameters , Generated parse_args Object to package all the parameters , It is very convenient to pass modified parameters in multiple files

	import argparse
	
	parser = argparse.ArgumentParser(description='MODELname') #  Give the parameter parser a name 
    register_data_args(parser)
    parser.add_argument("--dropout", type=float, default=0.5,
                        help="dropout probability")
    parser.add_argument("--gpu", type=int, default=-1,
                        help="gpu")
    parser.add_argument("--lr", type=float, default=1e-2,
                        help="learning rate")
    parser.add_argument("--n-epochs", type=int, default=200,
                        help="number of training epochs")
    parser.add_argument("--n-hidden", type=int, default=16,
                        help="number of hidden gcn units")
    parser.add_argument("--n-layers", type=int, default=1,
                        help="number of hidden gcn layers")
    parser.add_argument("--weight-decay", type=float, default=5e-4,
                        help="Weight for L2 loss")
    parser.add_argument("--aggregator-type", type=str, default="gcn",
                        help="Aggregator type: mean/gcn/pool/lstm")
    config = parser.parse_args()
    
    #  And then you can use config.xxx Represents each parameter 
    #  debugging , Input... At the terminal  python train.py --n-epoch 100 --lr 1e-3 ....

Model framework

Dataloader

To put load data The most important part is in getitem Function below , The body of a class only records train data The path of , In this way, you can use some adjustment during training , Will not result in too high CPU Memory footprint
len() When rewriting a function, you must pay attention to the and getitem Match the amount of data in

class TrainDataset(Dataset):
    def __init__(self,listdir=list_dir):
        super(TrainDataset, self).__init__()
        self.train_dirs = []
        for dir in listdir:
        	self.train_dirs.append(dir)
        	...
        	
    def __getitem__(self, index):
    	path = self.train_dirs[index]
    	data = np.load(path)
    	...
    	
    def __len__(self):
        return len(self.train_dirs)

collate_fn

In the build Dataloader Object can be set collate_fn Parameters , Incoming data processing functions , Need to write by yourself .

trainloader = Dataloader(dataset = train_dataset,shuffle=True,collate_fn=my_func)

collate_fn The function of is to customize the data acquisition method

Learning rate

lr_scheduler

torch.optim.lr_scheduler The module provides some basis epoch Training times to adjust learning rate （learning rate） Methods . In general, we will set the following epoch And gradually reduce the learning rate so as to achieve better training results .

Common learning rate adjustment strategies are ：

StepLR： Adjust the learning rate at equal intervals , Each adjustment is lr*gamma, Adjust the interval to step_size

scheduler = torch.optim.lr_scheduler.StepLR(optimizer,step_size,gamma=0.1,last_epoch=-1,verbose=False)

Parameters ：

1、optimizer： Optimizer for setup

2、step_size： Learning rate adjustment step size , Each pass step_size Updated once

3、gamma： Learning rate adjustment multiple

4、last_epoch：last_epoch Then recover lr by initial_lr( If you train a lot epoch After that Keep training This value is equal to the... Of the loaded model epoch The default is -1 Means to train from scratch , From epoch=1 Start

5、verbose： Whether to output once for each change lr Value

MultiStepLR : At present epoch When the number meets the set value , Adjust the learning rate . This method is suitable for later debugging , Observe loss curve , Set a learning rate adjustment period for each experiment

scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1, verbose=False)

milestones： A contain epoch Indexed list, Each index in the list represents the time to adjust the learning rate epoch.list The value in must be incremented . Such as [20, 50, 100] It means that epoch by 20,50,100 Adjust the learning rate .

Other parameter setting methods are the same

ExponentialLR： Adjust the learning rate by exponential decay , Adjustment formula ：lr = lr*gamma**epoch

scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=-1, verbose=False)

CosineAnnealingLR： Cosine annealing strategy , Periodically adjust the learning rate

scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1, verbose=False)

Parameters ：
1、T_max ： Learning rate adjustment cycle , Adjust the learning rate back to the initial value every cycle , Decay again , This strategy helps to jump out of the saddle . Usually set to len(train_dataset)

2、eta_min： Minimum learning rate attenuated ,defult by 0

3、last_epoch： the previous epoch Count , This variable is used to indicate whether the learning rate needs to be adjusted . When last_epoch The learning rate will be adjusted when the set interval is met . When set to -1 when , Set the learning rate to the initial value

At every train step Update learning rate in ：

scheduler.step()

warmup

warmup Is a learning rate optimization method , First appeared in resnet In the paper , At the beginning of model training, select a small learning rate , After a period of training （10epoch perhaps 10000steps） Use the preset learning rate for training
Use warmup Why ：

At the beginning of model training , Weight randomization , The understanding of data is 0, At the first epoch in , The model will quickly adjust parameters according to the input data , At this time, if a large learning rate is adopted , There is a great possibility that the model will deviate , It takes more rounds to pull back
After a period of model training , Have some prior knowledge of data , At this time, it is not easy to learn biases by using a larger learning rate model , You can use a higher learning rate to speed up your training .
The model uses a large learning rate to train for a period of time , The distribution of the model is relatively stable , It is not appropriate to learn new features from the data , If we continue to use a large learning rate, it will destroy the stability of the model , And using a smaller learning rate is more optimal .

warm_up Realization

class WarmupLR(_LRScheduler):
    """The WarmupLR scheduler This scheduler is almost same as NoamLR Scheduler except for following difference: NoamLR: lr = optimizer.lr * model_size ** -0.5 * min(step ** -0.5, step * warmup_step ** -1.5) WarmupLR: lr = optimizer.lr * warmup_step ** 0.5 * min(step ** -0.5, step * warmup_step ** -1.5) Note that the maximum lr equals to optimizer.lr in this scheduler. """
    def __init__(
        self,
        optimizer: torch.optim.Optimizer,
        warmup_steps: Union[int, float] = 25000,
        last_epoch: int = -1,
    ):
        assert check_argument_types()
        self.warmup_steps = warmup_steps

        # __init__() must be invoked before setting field
        # because step() is also invoked in __init__()
        super().__init__(optimizer, last_epoch)

    def __repr__(self):
        return f"{
      self.__class__.__name__}(warmup_steps={
      self.warmup_steps})"

    def get_lr(self):
        step_num = self.last_epoch + 1
        return [
            lr
            * self.warmup_steps ** 0.5
            * min(step_num ** -0.5, step_num * self.warmup_steps ** -1.5)
            for lr in self.base_lrs
        ]

    def set_step(self, step: int):
        self.last_epoch = step

warm_up coordination lr_scheduler Use it together , First, increase the learning rate linearly , Decay again ：

if step_counter>10:
	scheduler.step()

model training

Distributed

nn.DataParallel

adopt torch.nn.DataParallel Distributed training , More than one piece is required on the host GPU, Allocate data to multiple blocks during training GPU Multi process training on , each GPU On the Internet optimize, then loss Summarize and average , Broadcast the back-propagation gradient to each block GPU On

model= MY_MODEL()
  if torch.cuda.device_count() > 1:
	 model = nn.DataParallel(model)

Be careful not to use when opening the distributed system next(model.parameters()), And there are... In the model the foriegner model when , You can't give these... Inside the model foriegner model Specify the equipment number (cuda：0), It can be used torch.device('cuda') Instead of , Let the program allocate itself , So is the data , Otherwise, the data of model operation will be placed in different places GPU The situation of

Look at the model 、 The device where the data resides

# data type:torch.tensor
print(next(model.parameters()).device)
print(data.device)

Determine whether the data is gpu On ：

print(data.is_cuda)

The data is in gpu、cpu Mutual conversion between ：

#  The simplest ,data type:torch.tensor
data.cpu()
data.cuda()

#  Set up the device 
device = torch.device('cuda:0' if torch.cuda_is_avaliable() else 'cpu')    
data.to(device)

local rank

stay pytorch Distributed training on , Appoint local rank, In process GPU Number , Non explicit parameters , from torch.distributed.launch Internal designation . host GPU Of local_rank by 0. For example , rank = 3,local_rank = 0 It means the first one 3 The first in a process 1 block GPU Host process local rank=0

In each iteration , Each process has its own optimizer , And independently complete all the optimization steps , In process training is the same as general training .

After the gradient calculation of each process is completed , Each process needs to aggregate and average the gradients , And then by rank=0 The process of , Put it broadcast To all processes . after , Processes use this gradient to update parameters .

Set which block GPU Work

torch.cuda.set_device(arg.local_rank)
device = torch.device('cuda',arg.local_rank)

Debugging tips 、 Data visualization

Set button , Check if the data set has been downloaded , If something goes wrong raise RuntimeError

if not self.check_integrity():
    raise RuntimeError('Dataset not found or corrupted.' +
                       ' You need to download it from official website.')
    
#  Check if the path to the dataset exists 
def check_integrity(self):
    if not os.path.exists(self.root_dir):
        return False
    else:
        return True

use enumerate And list derivation （for expression ） Generate word2idx,labels2idx, such idx You don't have to set variables separately for loop +=1 To set up

self.label2index = {
    label: index for index, label in enumerate(sorted(set(labels)))}

Record the total number of parameters of the model

# pytorch Medium numel Function Statistics tensor The total amount of elements in 
num_params = sum(p.numel() for p in model.parameters())
print(num_params)

Training log

logging modular

use logging The module records the training log , Write log files online in the background , It can also be output online , Easy to monitor training information

 1 import logging  
 2 
#  File storage address , file name 、 Information level 
 3 logging.basicConfig(filename=os.path.join(self.writer.log_dir, 'training.log'),level=logging.DEBUG, format='%(asctime)s - %(name)s - %(message)s')  
 4   
 5 logging.debug('this is debug message')    #  This information will be recorded in training.log in 
 6 logging.info('this is info message')  
 7 logging.warning('this is warning message')  
 8   
 9 ''''' 10  result ： 11 2017-08-23 14:22:25,713 - root - this is debug message 12 2017-08-23 14:22:25,713 - root - this is info message 13 2017-08-23 14:22:25,714 - root - this is warning message 14 '''

logging.basicConfig Function parameters :

filename: Specify the log file name
filemode: and file Functions have the same meaning , Specify the opening mode of the log file ,‘w’ or ’a’
format: Specify the format and content of the output ,format Can output a lot of useful information , As shown in the example above :
%(levelno)s: Print log level values
%(levelname)s: Print log level name
%(pathname)s: Print the path of the currently executing program , In fact, that is sys.argv[0]
%(filename)s: Print the name of the currently executing program
%(funcName)s: Print the current function of the log
%(lineno)d: Print the current line number of the log
%(asctime)s: Time to print the log
%(thread)d: Print thread ID
%(threadName)s: Print thread name
%(process)d: Printing process ID
%(message)s: Print log information
datefmt: Specify the time format , Same as time.strftime()
level: Set the log level , The default is logging.WARNING
stream: Specifies the output stream that will log , Output to can be specified sys.stderr,sys.stdout Or documents , Default output to sys.stderr, When stream and filename When appointed at the same time ,stream Be ignored

Output the log to both the file and the screen ：

#  Set log name , Usually named after the master file 
logger1 = logging.getLogger(__name__)
logging.basicConfig(filename=os.path.join(self.writer.log_dir, 'training.log'),level=logging.DEBUG)
logging.info(f"Start SimCLR training for {
      self.args.epochs} epochs.")
logging.info(f"Training with gpu: {
      self.args.disable_cuda}.")
....

#  Log output to file 
fh1 = logging.FileHandler(filename='a1.log', encoding='utf-8')  #  file a1
fh2 = logging.FileHandler(filename='a2.log', encoding='utf-8')  #  file a2
sh = logging.StreamHandler()  #  Log output to terminal slice