当前位置:网站首页>Pytoch deep learning code skills
Pytoch deep learning code skills
2022-06-26 14:58:00 【Eight dental shoes】
There are many pitfalls in building a deep learning network , I also see a lot of skills when reading other people's code , Make a unified record , It's also convenient for you to check
Parameter configuration
Argparser library
Argparser Kuo is python A library of its own , Use Argparser Can make us feel like Linux The system uses the command line to set parameters , Generated parse_args Object to package all the parameters , It is very convenient to pass modified parameters in multiple files
import argparse
parser = argparse.ArgumentParser(description='MODELname') # Give the parameter parser a name
register_data_args(parser)
parser.add_argument("--dropout", type=float, default=0.5,
help="dropout probability")
parser.add_argument("--gpu", type=int, default=-1,
help="gpu")
parser.add_argument("--lr", type=float, default=1e-2,
help="learning rate")
parser.add_argument("--n-epochs", type=int, default=200,
help="number of training epochs")
parser.add_argument("--n-hidden", type=int, default=16,
help="number of hidden gcn units")
parser.add_argument("--n-layers", type=int, default=1,
help="number of hidden gcn layers")
parser.add_argument("--weight-decay", type=float, default=5e-4,
help="Weight for L2 loss")
parser.add_argument("--aggregator-type", type=str, default="gcn",
help="Aggregator type: mean/gcn/pool/lstm")
config = parser.parse_args()
# And then you can use config.xxx Represents each parameter
# debugging , Input... At the terminal python train.py --n-epoch 100 --lr 1e-3 ....
Model framework
Dataloader
- To put load data The most important part is in getitem Function below , The body of a class only records train data The path of , In this way, you can use some adjustment during training , Will not result in too high CPU Memory footprint
- len() When rewriting a function, you must pay attention to the and getitem Match the amount of data in
class TrainDataset(Dataset):
def __init__(self,listdir=list_dir):
super(TrainDataset, self).__init__()
self.train_dirs = []
for dir in listdir:
self.train_dirs.append(dir)
...
def __getitem__(self, index):
path = self.train_dirs[index]
data = np.load(path)
...
def __len__(self):
return len(self.train_dirs)
- collate_fn
In the build Dataloader Object can be set collate_fn Parameters , Incoming data processing functions , Need to write by yourself .
trainloader = Dataloader(dataset = train_dataset,shuffle=True,collate_fn=my_func)
collate_fn The function of is to customize the data acquisition method
Learning rate
- lr_scheduler
torch.optim.lr_scheduler The module provides some basis epoch Training times to adjust learning rate (learning rate) Methods . In general, we will set the following epoch And gradually reduce the learning rate so as to achieve better training results .
Common learning rate adjustment strategies are :
StepLR: Adjust the learning rate at equal intervals , Each adjustment is lr*gamma, Adjust the interval to step_size
scheduler = torch.optim.lr_scheduler.StepLR(optimizer,step_size,gamma=0.1,last_epoch=-1,verbose=False)
Parameters :
1、optimizer: Optimizer for setup
2、step_size: Learning rate adjustment step size , Each pass step_size Updated once
3、gamma: Learning rate adjustment multiple
4、last_epoch:last_epoch Then recover lr by initial_lr( If you train a lot epoch After that Keep training This value is equal to the... Of the loaded model epoch The default is -1 Means to train from scratch , From epoch=1 Start
5、verbose: Whether to output once for each change lr Value
MultiStepLR : At present epoch When the number meets the set value , Adjust the learning rate . This method is suitable for later debugging , Observe loss curve , Set a learning rate adjustment period for each experiment
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1, verbose=False)
milestones: A contain epoch Indexed list, Each index in the list represents the time to adjust the learning rate epoch.list The value in must be incremented . Such as [20, 50, 100] It means that epoch by 20,50,100 Adjust the learning rate .
Other parameter setting methods are the same
ExponentialLR: Adjust the learning rate by exponential decay , Adjustment formula :lr = lr*gamma**epoch
scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=-1, verbose=False)
CosineAnnealingLR: Cosine annealing strategy , Periodically adjust the learning rate
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1, verbose=False)
Parameters :1、T_max : Learning rate adjustment cycle , Adjust the learning rate back to the initial value every cycle , Decay again , This strategy helps to jump out of the saddle . Usually set to len(train_dataset)
2、eta_min: Minimum learning rate attenuated ,defult by 0
3、last_epoch: the previous epoch Count , This variable is used to indicate whether the learning rate needs to be adjusted . When last_epoch The learning rate will be adjusted when the set interval is met . When set to -1 when , Set the learning rate to the initial value
At every train step Update learning rate in :
scheduler.step()
- warmup
warmup Is a learning rate optimization method , First appeared in resnet In the paper , At the beginning of model training, select a small learning rate , After a period of training (10epoch perhaps 10000steps) Use the preset learning rate for training
Use warmup Why :
At the beginning of model training , Weight randomization , The understanding of data is 0, At the first epoch in , The model will quickly adjust parameters according to the input data , At this time, if a large learning rate is adopted , There is a great possibility that the model will deviate , It takes more rounds to pull back
After a period of model training , Have some prior knowledge of data , At this time, it is not easy to learn biases by using a larger learning rate model , You can use a higher learning rate to speed up your training .
The model uses a large learning rate to train for a period of time , The distribution of the model is relatively stable , It is not appropriate to learn new features from the data , If we continue to use a large learning rate, it will destroy the stability of the model , And using a smaller learning rate is more optimal .
warm_up Realization
class WarmupLR(_LRScheduler):
"""The WarmupLR scheduler This scheduler is almost same as NoamLR Scheduler except for following difference: NoamLR: lr = optimizer.lr * model_size ** -0.5 * min(step ** -0.5, step * warmup_step ** -1.5) WarmupLR: lr = optimizer.lr * warmup_step ** 0.5 * min(step ** -0.5, step * warmup_step ** -1.5) Note that the maximum lr equals to optimizer.lr in this scheduler. """
def __init__(
self,
optimizer: torch.optim.Optimizer,
warmup_steps: Union[int, float] = 25000,
last_epoch: int = -1,
):
assert check_argument_types()
self.warmup_steps = warmup_steps
# __init__() must be invoked before setting field
# because step() is also invoked in __init__()
super().__init__(optimizer, last_epoch)
def __repr__(self):
return f"{
self.__class__.__name__}(warmup_steps={
self.warmup_steps})"
def get_lr(self):
step_num = self.last_epoch + 1
return [
lr
* self.warmup_steps ** 0.5
* min(step_num ** -0.5, step_num * self.warmup_steps ** -1.5)
for lr in self.base_lrs
]
def set_step(self, step: int):
self.last_epoch = step
warm_up coordination lr_scheduler Use it together , First, increase the learning rate linearly , Decay again :
if step_counter>10:
scheduler.step()
model training
Distributed
- nn.DataParallel
adopt torch.nn.DataParallel Distributed training , More than one piece is required on the host GPU, Allocate data to multiple blocks during training GPU Multi process training on , each GPU On the Internet optimize, then loss Summarize and average , Broadcast the back-propagation gradient to each block GPU On
model= MY_MODEL()
if torch.cuda.device_count() > 1:
model = nn.DataParallel(model)
Be careful not to use when opening the distributed system next(model.parameters()), And there are... In the model the foriegner model when , You can't give these... Inside the model foriegner model Specify the equipment number (cuda:0), It can be used torch.device('cuda') Instead of , Let the program allocate itself , So is the data , Otherwise, the data of model operation will be placed in different places GPU The situation of
- Look at the model 、 The device where the data resides
# data type:torch.tensor
print(next(model.parameters()).device)
print(data.device)
Determine whether the data is gpu On :
print(data.is_cuda)
The data is in gpu、cpu Mutual conversion between :
# The simplest ,data type:torch.tensor
data.cpu()
data.cuda()
# Set up the device
device = torch.device('cuda:0' if torch.cuda_is_avaliable() else 'cpu')
data.to(device)
- local rank
stay pytorch Distributed training on , Appoint local rank, In process GPU Number , Non explicit parameters , from torch.distributed.launch Internal designation . host GPU Of local_rank by 0. For example , rank = 3,local_rank = 0 It means the first one 3 The first in a process 1 block GPU Host process local rank=0
In each iteration , Each process has its own optimizer , And independently complete all the optimization steps , In process training is the same as general training .
After the gradient calculation of each process is completed , Each process needs to aggregate and average the gradients , And then by rank=0 The process of , Put it broadcast To all processes . after , Processes use this gradient to update parameters .
Set which block GPU Work
torch.cuda.set_device(arg.local_rank)
device = torch.device('cuda',arg.local_rank)
Debugging tips 、 Data visualization
- Set button , Check if the data set has been downloaded , If something goes wrong raise RuntimeError
if not self.check_integrity():
raise RuntimeError('Dataset not found or corrupted.' +
' You need to download it from official website.')
# Check if the path to the dataset exists
def check_integrity(self):
if not os.path.exists(self.root_dir):
return False
else:
return True
- use enumerate And list derivation (for expression ) Generate word2idx,labels2idx, such idx You don't have to set variables separately for loop +=1 To set up
self.label2index = {
label: index for index, label in enumerate(sorted(set(labels)))}
- Record the total number of parameters of the model
# pytorch Medium numel Function Statistics tensor The total amount of elements in
num_params = sum(p.numel() for p in model.parameters())
print(num_params)
Training log
- logging modular
use logging The module records the training log , Write log files online in the background , It can also be output online , Easy to monitor training information
1 import logging
2
# File storage address , file name 、 Information level
3 logging.basicConfig(filename=os.path.join(self.writer.log_dir, 'training.log'),level=logging.DEBUG, format='%(asctime)s - %(name)s - %(message)s')
4
5 logging.debug('this is debug message') # This information will be recorded in training.log in
6 logging.info('this is info message')
7 logging.warning('this is warning message')
8
9 ''''' 10 result : 11 2017-08-23 14:22:25,713 - root - this is debug message 12 2017-08-23 14:22:25,713 - root - this is info message 13 2017-08-23 14:22:25,714 - root - this is warning message 14 '''
logging.basicConfig Function parameters :
filename: Specify the log file name
filemode: and file Functions have the same meaning , Specify the opening mode of the log file ,‘w’ or ’a’
format: Specify the format and content of the output ,format Can output a lot of useful information , As shown in the example above :
%(levelno)s: Print log level values
%(levelname)s: Print log level name
%(pathname)s: Print the path of the currently executing program , In fact, that is sys.argv[0]
%(filename)s: Print the name of the currently executing program
%(funcName)s: Print the current function of the log
%(lineno)d: Print the current line number of the log
%(asctime)s: Time to print the log
%(thread)d: Print thread ID
%(threadName)s: Print thread name
%(process)d: Printing process ID
%(message)s: Print log information
datefmt: Specify the time format , Same as time.strftime()
level: Set the log level , The default is logging.WARNING
stream: Specifies the output stream that will log , Output to can be specified sys.stderr,sys.stdout Or documents , Default output to sys.stderr, When stream and filename When appointed at the same time ,stream Be ignored
- Output the log to both the file and the screen :
# Set log name , Usually named after the master file
logger1 = logging.getLogger(__name__)
logging.basicConfig(filename=os.path.join(self.writer.log_dir, 'training.log'),level=logging.DEBUG)
logging.info(f"Start SimCLR training for {
self.args.epochs} epochs.")
logging.info(f"Training with gpu: {
self.args.disable_cuda}.")
....
# Log output to file
fh1 = logging.FileHandler(filename='a1.log', encoding='utf-8') # file a1
fh2 = logging.FileHandler(filename='a2.log', encoding='utf-8') # file a2
sh = logging.StreamHandler() # Log output to terminal slice
边栏推荐
猜你喜欢

赠书 | 《认知控制》:我们的大脑如何完成任务?

MySQL master-slave replication and read-write separation

710. random numbers in the blacklist

VMware partial settings

在云服务器中云磁盘如何挂载

使用宝塔面板部署flask环境

【云原生】 ”人人皆可“ 编程的无代码 iVX 编辑器

Unity 利用Skybox Panoramic着色器制作全景图预览有条缝隙问题解决办法

Keil4打开单片机工程一片空白,cpu100%程序卡死的问题解决

One copy ten, CVPR oral was accused of plagiarizing a lot, and it was exposed on the last day of the conference!
随机推荐
一篇抄十篇,CVPR Oral被指大量抄袭,大会最后一天曝光!
Mark: unity3d cannot select resources in the inspector, that is, project locking
Attention meets geometry: geometry guided spatiotemporal attention consistency self supervised monocular depth estimation
Talk about the RPA direction planning: stick to simple and valuable things for a long time
Naacl2022: (code practice) good visual guidance promotes better feature extraction, multimodal named entity recognition (with source code download)
Summary of decimal point of amount and price at work and pit
fileinput.js php,fileinput
K gold Chef (two conditions, two points and difference)
Electron
北京银行x华为:网络智能运维夯实数字化转型服务底座
View触摸分析
Electron
R语言使用epiDisplay包的aggregate函数将数值变量基于因子变量拆分为不同的子集,计算每个子集的汇总统计信息、使用aggregate.data.frame函数计算分组汇总统计信息
MHA high availability coordination and failover
Complimentary Book Cognitive Control: how does our brain accomplish tasks?
R语言dplyr包bind_rows函数把两个dataframe数据的行纵向(竖直)合并起来、最终行数为原来两个dataframe行数的加和(Combine Data Frames)
详解C语言编程题:任意三条边能否构成三角形,输出该三角形面积并判断其类型
Excel-vba quick start (II. Condition judgment and circulation)
Heap optimization dijkstra/hash table storage node number
R语言glm函数逻辑回归模型、使用epiDisplay包logistic.display函数获取模型汇总统计信息(自变量初始和调整后的优势比及置信区间,回归系数的Wald检验的p值)、结果保存到csv