当前位置:网站首页>堡垒机安装pytorch,mmcv,mmclassification,并训练自己的数据集
堡垒机安装pytorch,mmcv,mmclassification,并训练自己的数据集
2022-06-23 21:56:00 【一届书生#】
堡垒机创建conda环境,并激活进入环境
conda create -n mmclassification python=3.7
conda activate mmclassification
堡垒机安装pytorch,torchvision,cudatoolkit
下载torch,torchvision安装包
在这个网址中 pytorch | 清华大学开源软件镜像站 ,先把你需要安装的版本下载下来,然后上传到堡垒机。例如我下载的:
上传并安装
在堡垒机中进入你的conda环境(下面的mmclassification改为自己的conda环境名字),然后安装一下就可以了。
conda activate mmclassification
conda install pytorch-1.10.0-py3.7_cuda11.3_cudnn8.2.0_0.tar.bz2
conda install torchvision-0.11.0-py37_cu113.tar.bz2
# 再安装一下cudatoolkit
conda install cudatoolkit=10.2
堡垒机安装mmcv
下载mmcv安装包
也是同样,去 mmcv GitHub官网 下载mmcv你想要的版本,复制-f后边的网址打开,然后选择自己要装的mmcv版本,下载下来。
上传到堡垒机中后安装。
安装mmcv
conda install mmcv
堡垒机安装mmclassification
1️⃣ 先去mmclassification GitHub网址下载最新版本的压缩包,并上传到堡垒机。
open-mmlab/mmclassification: OpenMMLab Image Classification Toolbox and Benchmark (github.com)
2️⃣ 在堡垒机中切换到mmclassification根目录下,运行命令安装
pip3 install -e .
运行mmclassification demo文件
下载预训练
在mmclassification根目录新建个checkpoints文件夹,然后本地下载预训练权重并上传到堡垒机。以下载 resnet18_8xb32_in1k_20210831-fbbb1da6.pth 为例
resnet所有预训练下载地址: mmclassification/resnet
resnet18_8xb32_in1k_20210831-fbbb1da6.pth下载地址: resnet18_8xb32_in1k_20210831-fbbb1da6.pth
mkdir checkpoints
运行demo代码
记得修改命令中 checkpoints/后面的文件名,后续mmclassification可能会更新。
python demo/image_demo.py demo/demo.JPEG configs/resnet/resnet18_8xb32_in1k.py checkpoints/resnet18_8xb32_in1k_20210831-fbbb1da6.pth
demo运行结果:
load checkpoint from local path: checkpoints/resnet18_8xb32_in1k_20210831-fbbb1da6.pth
{
"pred_label": 58,
"pred_score": 0.3810223340988159,
"pred_class": "water snake"
}
制作自己的数据集
数据集划分
划分前是这样的:划分前数据集的格式:我有两个类,所以有两个文件夹。将自己的数据按下面格式放好。
root_path
├── benign
│ ├── B_zsvinno_1.jpg
│ ├── B_zsvinno_2.jpg
│ └── ...
│
├── malignant
│ ├── M_zsvinno_1.jpg
│ ├── M_zsvinno_2.jpg
│ └── ...
└── split_data.py
划分数据集的脚本:【train:val:test=7:2:1】,要修改的地方仔细看注释
import glob
import os
import shutil
import random
import re
################################## 划分train,val,test #########################################
# 参数设置
train_prop = 0.7
val_prop = 2 / 3 # val在val和test中的占比
class_num = 2 ################# 修改为自己的类别数
class_name = ["benign", "malignant"] ################# 修改为自己的类名
root_dir = os.getcwd()
work_dir = os.path.join(root_dir, 'data')
def split():
class_data_dir = []
for i in range(class_num):
class_data_dir.append(os.path.join(root_dir, class_name[i]))
print(class_data_dir)
images_data = []
for i in class_data_dir:
images_data.append(os.listdir(i))
# 划分训练集、验证集、测试集的图片下标
train_index, valtest_index, val_index, test_index = [], [], [], []
for i in range(class_num):
train_index.append(random.sample(range(len(images_data[i])), int(len(images_data[i]) * train_prop)))
valtest_index = list(set(range(len(images_data[i]))) - set(train_index[i]))
val_index.append(random.sample(valtest_index, int(len(valtest_index) * val_prop)))
test_index.append(list(set(valtest_index) - set(val_index[i])))
# 重新创建train,val,test文件夹
os.makedirs(os.path.join(work_dir, "train"))
os.makedirs(os.path.join(work_dir, "val"))
os.makedirs(os.path.join(work_dir, "test"))
# 创建每个类的文件夹,从0开始
for i in range(class_num):
os.makedirs(os.path.join(work_dir, "train", class_name[i]))
os.makedirs(os.path.join(work_dir, "val", class_name[i]))
os.makedirs(os.path.join(work_dir, "test", class_name[i]))
# 将图片拷贝到train,val,test文件夹
for i in range(class_num):
for j in train_index[i]:
shutil.copy(os.path.join(class_data_dir[i], images_data[i][j]),
os.path.join(work_dir, "train", class_name[i]))
for j in val_index[i]:
shutil.copy(os.path.join(class_data_dir[i], images_data[i][j]),
os.path.join(work_dir, "val", class_name[i]))
for j in test_index[i]:
shutil.copy(os.path.join(class_data_dir[i], images_data[i][j]),
os.path.join(work_dir, "test", class_name[i]))
# 打印结果
print('-' * 50)
for i in range(class_num):
print('|' + class_name[i] + ' train num' + ': ' + str(len(train_index[i])))
print('|' + class_name[i] + ' val num' + ': ' + str(len(val_index[i])))
print('|' + class_name[i] + ' test num' + ': ' + str(len(test_index[i])))
print()
print('-' * 50)
################################## 创建classes.txt文件 #########################################
def create_clsses_txt():
# 创建classes.txt文件
with open(os.path.join(work_dir, 'classes.txt'), 'w') as f:
for i in range(class_num):
f.write(f'{
class_name[i]}\n')
print('| classes.txt文件创建成功')
print('| classes.txt文件路径:' + os.path.join(work_dir, 'classes.txt'))
print('-' * 50)
################################## 创建train.txt, val.txt, test.txt ############################
def create_txt():
def generate_txt(images_dir, map_dict):
# 读取所有文件名
imgs_dirs = glob.glob(images_dir + "/*/*.jpg")
# print(imgs_dirs)
# 打开写入文件
typename = images_dir.split("/")[-1]
target_txt_path = os.path.join(work_dir, typename + ".txt")
f = open(target_txt_path, "w")
# 遍历所有图片名
for img_dir in imgs_dirs:
# 获取第一级目录名称
filename = img_dir.split("/")[-2]
num = map_dict[filename]
# 写入文件
relate_name = re.findall(typename + "/([\w / - .]*)", img_dir)
f.write(relate_name[0] + " " + num + "\n")
train_dir = os.path.join(work_dir, "train")
val_dir = os.path.join(work_dir, "val")
test_dir = os.path.join(work_dir, "test")
# 创建字典,用于映射类别名称与类别编号
class_map_dict = {
}
for i in range(class_num):
class_map_dict[class_name[i]] = str(i)
generate_txt(images_dir=train_dir, map_dict=class_map_dict)
generate_txt(images_dir=val_dir, map_dict=class_map_dict)
generate_txt(images_dir=test_dir, map_dict=class_map_dict)
print('| train.txt, val.txt, test.txt文件创建成功')
print('| train dir', train_dir)
print('| val dir', val_dir)
print('| test dir', test_dir)
print('-' * 50)
if __name__ == '__main__':
split() # 将数据集划分为train,val,test三个文件夹
create_clsses_txt() # 创建classes.txt文件
create_txt() # 创建train.txt, val.txt, test.txt文件
划分后:
root_path
├── benign
├── data
│ ├── train
│ │ ├── benign
│ │ └── malignant
│ ├── train.txt
│ ├── val
│ │ ├── benign
│ │ └── malignant
│ ├── val.txt
│ ├── test
│ │ ├── benign
│ │ └── malignant
│ ├── test.txt
│ └── classes.txt
├── malignant
└── split_data.py
这样我们的数据集就准备好了 ️
创建训练的配置文件
创建配置文件
想找到我们要用的模型,然后把文件名复制一下用来修改下面代码的内容:我以 convnext-large_64xb64_in1k.py 为例
我要在mmclassification/目录下创建一个work_dirs目录,然后创建一个 create_config.py 文件,用于创建一系列 mmclassification 的配置文件。
create_config.py 内容如下:,要修改的地方仔细看注释
import os
from mmcv import Config
########################### 下面是一些超参数,可以自行修改 #############################
# model内参数设置
num_classes = 2 # 修改为自己的类别数
topk = (1,) # 修改为自己的topk,
# datasets内参数设置
root_path = os.getcwd()
model_name = 'convnext-large_64xb64_in1k' # 改成自己要使用的模型名字
work_dir = os.path.join(root_path, "work_dirs", 'convnext-large_64xb64_in1k_job2') # 训练保存文件的路径,job1,job2,,自己修改。
baseline_cfg_path = os.path.join('configs', 'convnext', 'convnext-large_64xb64_in1k.py') # 改成自己要使用的模型的路径
save_cfg_path = os.path.join(work_dir, 'config.py') # 生成的配置文件保存的路径
train_data_prefix = os.path.join(root_path, 'data', 'thyroid_cls', 'data', 'train') # 改成自己训练集图片的目录。
val_data_prefix = os.path.join(root_path, 'data', 'thyroid_cls', 'data', 'val') # 改成自己验证集图片的目录。
test_data_prefix = os.path.join(root_path, 'data', 'thyroid_cls', 'data', 'test') # 改成自己测试集图片的目录,没有测试机的发,可以用验证集。
train_ann_file = os.path.join(root_path, 'data', 'thyroid_cls', 'data', 'train.txt') # 修改为自己的数据集的训练集txt文件
val_ann_file = os.path.join(root_path, 'data', 'thyroid_cls', 'data', 'val.txt') # 修改为自己的数据集的验证集txt文件
test_ann_file = os.path.join(root_path, 'data', 'thyroid_cls', 'data', 'test.txt') # 修改为自己的数据集的测试集txt文件,没有测试集的话,可以用验证集。
classes = os.path.join(root_path, 'data', 'thyroid_cls', 'data', 'classes.txt') # 在自己的数据集目录下创建一个类别文件classes.txt,每行一个类别。
# 去找个网址里找你对应的模型的网址: https://mmclassification.readthedocs.io/en/latest/model_zoo.html
# 下载下来后,放到work_dir下面,并把名字改为checkpoint.pth。
load_from = os.path.join(work_dir, 'checkpoint.pth')
# 一些超参数,可以自行修改
gpu_num = 8 # 修改为自己的gpu数量
total_epochs = 100 # 改成自己想训练的总epoch数
batch_size = 2 ** 4 # 根据自己的显存,改成合适数值,建议是2的倍数。
num_worker = 8 # 比batch_size小,可以根据CPU核心数调整。
log_interval = 5 # 日志打印的间隔
checkpoint_interval = 15 # 权重文件保存的间隔
# lr = 0.02 # 学习率
########################### 上边是一些超参数,可以自行修改 #############################
def create_config():
cfg = Config.fromfile(baseline_cfg_path)
if not os.path.exists(work_dir):
os.makedirs(work_dir)
cfg.work_dir = work_dir
# model内参数设置
cfg.model.head.num_classes = num_classes
cfg.model.head.topk = topk
if num_classes < 5:
cfg.evaluation = dict(metric_options={
'topk': (1,)})
# datasets内参数设置
cfg.data.train.data_prefix = train_data_prefix
# cfg.data.train.ann_file = train_ann_file
cfg.data.train.classes = classes
cfg.data.val.data_prefix = val_data_prefix
cfg.data.val.ann_file = val_ann_file
cfg.data.val.classes = classes
cfg.data.test.data_prefix = test_data_prefix
cfg.data.test.ann_file = test_ann_file
cfg.data.test.classes = classes
cfg.data.samples_per_gpu = batch_size # Batch size of a single GPU used in testing
cfg.data.workers_per_gpu = num_worker # Worker to pre-fetch data for each single GPU
# 超参数设置
cfg.log_config.interval = log_interval
cfg.load_from = load_from
cfg.runner.max_epochs = total_epochs
cfg.total_epochs = total_epochs
# cfg.optimizer.lr = lr
cfg.checkpoint_config.interval = checkpoint_interval
# 保存配置文件
cfg.dump(save_cfg_path)
print("—" * 80)
print(f'CONFIG:\n{
cfg.pretty_text}')
print("—" * 80)
print("| Save config path:", save_cfg_path)
print("—" * 80)
print("| Load pretrain model path:", load_from)
print("—" * 80)
print('Please download the model pre-training weights, rename the "checkpoint.pth" '
'and put it in the following directory:', save_cfg_path[:-9])
print("—" * 80)
if __name__ == '__main__':
create_config()
在mmclassification根目录下,用命令行运行create_mmclassification_config.py文件
python work_dirs/create_config.py
下载预训练权重
下载配置文件中模型的预训练权重,并放到配置文件的工作目录下,也就是work_dirs/convnext-large_64xb64_in1k_job1/目录下,并重命名为checkpoint.pth
convnext-large_64xb64_in1k 预训练的下载地址,自己所用的模型,去mmclassification Github官网找一下:https://download.openmmlab.com/mmclassification/v0/convnext/convnext-large_in21k-pre-3rdparty_64xb64_in1k_20220124-2412403d.pth
开始训练
单机单卡:
python tools/train.py work_dirs/convnext-large_64xb64_in1k_job1/config.py
单机多卡
bash tools/dist_train.sh work_dirs/convnext-large_64xb64_in1k_job1/config.py 8
图像推理
在图像推理时,–out,–metrics必须赋值一个。
–metrics部分可选项如下
evaluation metrics, which depends on the dataset, e.g.,
“accuracy”, “precision”, “recall”, “f1_score”, “support” for single label dataset,
“mAP”, “CP”, “CR”, “CF1”, “OP”, “OR”, “OF1” for multi-label dataset
单张图像推理
python demo/image_demo.py ${
IMAGE_FILE} ${
CONFIG_FILE} ${
CHECKPOINT_FILE}
单机单卡推理测试集
python tools/test.py ${
CONFIG_FILE} ${
CHECKPOINT_FILE} [--metrics ${
METRICS}] [--out ${
RESULT_FILE}]
例1:python tools/test.py work_dirs/convnext-large_64xb64_in1k_job1/config.py work_dirs/convnext-large_64xb64_in1k_job1/checkpoint.pth --out result.pkl
例2:python tools/test.py work_dirs/convnext-large_64xb64_in1k_job1/config.py work_dirs/convnext-large_64xb64_in1k_job1/checkpoint.pth --metrics precision
单机多卡推理测试集
bash tools/dist_test.sh ${
CONFIG_FILE} ${
CHECKPOINT_FILE} ${
GPU_NUM} [--metrics ${
METRICS}] [--out ${
RESULT_FILE}]
例1:bash tools/dist_test.sh work_dirs/convnext-large_64xb64_in1k_job1/config.py work_dirs/convnext-large_64xb64_in1k_job1/checkpoint.pth 8 --out result.pkl
例2:bash tools/dist_test.sh work_dirs/convnext-large_64xb64_in1k_job1/config.py work_dirs/convnext-large_64xb64_in1k_job1/checkpoint.pth 8 --metrics precision
边栏推荐
- Section 29 basic configuration case of Tianrongxin topgate firewall
- How to use FTP to upload websites to the web
- Detailed explanation of MySQL database configuration information viewing and modification methods
- Data interpretation! Ideal L9 sprints to "sell more than 10000 yuan a month" to grab share from BBA
- CS5213 HDMI转VGA带音频信号输出方案
- ASM文件系统 数据如何写和读数据
- What are the steps required for TFTP to log in to the server through the fortress machine? Operation guide for novice
- How does data Vientiane CI | app quickly integrate HLS encryption to prevent video leakage?
- 蚂蚁获FinQA竞赛冠军,在长文本数值推理AI技术上取得突破
- sql server常用sql
猜你喜欢
Docker中部署Redis集群与部署微服务项目的详细过程

C#/VB. Net word to text

Ant group's self-developed tee technology has passed the national financial technology product certification

PHPMailer 发送邮件 PHP

巨头下场“摆摊”,大排档陷入“苦战”

Giants end up "setting up stalls" and big stalls fall into "bitter battle"

Ambire 指南:Arbitrum 奥德赛活动开始!第一周——跨链桥
How PostgreSQL creates partition tables
云原生流水线工具汇总

Phpmailer sends mail PHP
随机推荐
AAAI 2022 | Tencent Youtu 14 papers were selected, including image coloring, face security, scene text recognition and other frontier fields
SQL Server Common SQL
Low code helps live e-commerce bring goods into the manufacturing industry, impacting the traditional supply chain model of the factory
Application of clock synchronization system in banking system
The Sandbox 与 BAYZ 达成合作,共同带动巴西的元宇宙发展
什么是免疫组织化学实验? 免疫组织化学实验
What is an immunohistochemical experiment? Immunohistochemical experiment
What server is used for website construction? What is the price of the server
Notes to nodejs (III)
What are the operation and maintenance advantages of Fortress machine web application publishing server? Two outstanding advantages
December 14, 2021: rebuild the queue according to height. Suppose there's a bunch of people out of order
The Sandbox 归属周来啦!
Operation and maintenance failure experience sharing
MySQL事務隔離
How does data Vientiane CI | app quickly integrate HLS encryption to prevent video leakage?
Analysis and application of ThreadLocal source code
Can postman be integrated into Ci and CD pipelines for automated interface testing?
How to connect the fortress machine to the new server? What can I do if there is a problem with the fortress machine?
Trigger definition and syntax introduction in MySQL
First talk about the necessary skills of Architecture