当前位置：网站首页>Yolov3 target detection

Yolov3 target detection

2022-06-22 10:40:00 【liu-Mr】

List of articles

Preface
One 、 Preparation
Two 、 Use steps
summary

Preface

Always right opencv Very interested , I think its ability is really strong . and yolo It's like a thunderbolt. I've thought about learning for a long time , Just today, I had a video of this kind , Just learn by the way .
Reference resources ：
python And C++
Maple 333

One 、 Preparation

download yolov3 Related documents ： Download link

olov3.weights The file contains the pre trained network weights ;
olov3.weights download
yolov3.cfg File contains Network configuration ;
yolov3.cfg download
coco.names The file contains COCO In dataset 80 Different aliases .
coco.names download
Be careful ： Put these three files in the same directory as the program files , Otherwise, you need to change the import address of these three files .

Two 、 Use steps

1. Import and stock in

import cv2
import numpy as np

 install cv library ：
pip install opencv-python -i https://pypi.tuna.tsinghua.edu.cn/simple

2. Initialize configuration

coco.names It contains the object class name during model training . The file is read first .
The network consists of two parts ：
yolov3.weights - Model weight of pre training
yolov3.cfg - Network profile
Set up DNN The back end is OpenCV , The target is set to CPU. Can also be set to cv.dnn.DNN_TARGET_OPENCL In the GPU Up operation . Pay attention to the present OpenCV Version support only Intel Of GPUs test .

#  Get the camera or video address 
cap = cv2.VideoCapture(r"./data/test.mp4")

# coco.names The file stores 80 The name of a trained recognition type , And these category names are exactly the same as yolo What you're training 80 The categories correspond to each other 
classesFile = r"coco.names"
#  Storage type name list 
classNames = []
with open(classesFile, "rt") as f:
    #  Read data by row 
    classNames = f.read().splitlines()
#  Show all type names 
print(classNames)

#  To configure yolov3
modelConfiguration = "yolov3.cfg"  #  The configuration file 
modelWeights = "yolov3.weights"  #  Configure the weight file 
net = cv2.dnn.readNetFromDarknet(modelConfiguration, modelWeights)  #  Add the configuration file to dnn In the network 
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)  #  take DNN The back end is set to opencv
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)  #  take DNN Front end setting assembly cpu drive

3. Setting up network

The input image of the neural network needs to be blob Specific format organization of .
After reading a frame of picture , It needs to go through blobFromImage Function processing , To convert to a network input blob. In this process , Picture pixel value adopted 1/255 Scale the factor of to [0, 1] Range ; And without cutting , Resize the picture to (416, 416).

notes ： There is no de averaging operation , therefore , The mean parameter of the function is [0, 0, 0]. Set the default size of the input picture of the network ：inpWidth and inpHeight. Here are all set to 416, Can also be set to 320 To get faster , Set to 608 For better accuracy .

Output after input graph processing blob, As network input , Do forward calculation , To get a list of the output forecast bounding boxes . The prediction frame output from the network is post processed , To filter low confidence bounding boxes .

OpenCV Of Net Class forward The function needs to know the final output layer of the network .

Because you want to run the entire network , therefore , Need to confirm the last layer of the network . use getUnconnectedOutLayers() Function to get the name of the connectionless output layer , These layers are generally the output layer of the network .

while True:
    #  Reading data 
    success, frame = cap.read()
    # DNN The input image of the network needs to adopt a method called  blob  Specific format for 
    blob = cv2.dnn.blobFromImage(frame, 1 / 255, (inpWidth, inpHeight), [0, 0, 0], True, False)
    #  Will output blob As input to the incoming network 
    net.setInput(blob)
    #  Get the name of the input layer 
    layerNames = net.getLayerNames()
    #  Get the last layer of the input layer , To traverse the entire network 
    outputNames = [layerNames[i - 1] for i in net.getUnconnectedOutLayers()]
    outputs = net.forward(outputNames)
    findObjects(outputs, frame)
    #  Display images 
    cv2.imshow("img", frame)
    #  Wait for the button ESC
    if cv2.waitKey(10) == 27:
        break

#  Free memory 
cap.release()
cv2.destroyAllWindows()

4. Frame processing

Each bounding box of the network output is represented as Class alias + 5 Vectors of elements .

Front of vector 4 The elements are ：center_x , center_y , width and height.

The first 5 Elements represent the confidence level of the bounding box containing the object .
Insert picture description here
The remaining elements are the confidence levels associated with each category ( probability ). The bounding box is assigned to the category corresponding to the highest score . box The highest score of is also called confidence confidence. If box The confidence of is below a given threshold , The bounding box is discarded , No further post-treatment .

The confidence is greater than or equal to the given confidence threshold boxes, Will be carried out in NMS Further treatment , To reduce overlap boxes The number of .

If nmsThreshold The parameter is too small , Such as 0.1, Overlapping objects of the same or different categories may not be detected .

If nmsThreshold The parameter is too large , Such as ,1, You will get multiple frames of the same object .

# yolov3 Detect and handle 
def findObjects(outputs, img):
    hT, wT, cT = img.shape  #  Get the size of the original frame image H,W
    bbox = []  #  Create a list of coordinates to store a priori boxes 
    classIds = []  #  Create and store the category information name detected in each frame 
    confs = []  #  Create confidence values read per frame 
    for output in outputs:  #  Traverse all categories 
        for det in output:  #  testing frame Each category in the frame 
            scores = det[5:]  #  Get the category and 80 The probability of similarity of all categories 
            classId = np.argmax(scores)  #  get 80 The most similar category in the （ The category with the largest similar probability value ） The subscript 
            confidence = scores[classId]  #  Get the value of the maximum similarity probability 
            if confidence > confThreshold:  #  Judge the similarity threshold 
                #  Get the four coordinate points of the a priori box 
                w, h = int(det[2] * wT), int(det[3] * hT)
                x, y = int((det[0] * wT) - w / 2), int((det[1] * hT) - h / 2)

                bbox.append([x, y, w, h])  #  Add coordinates to bbox For storage , Convenient for frame A priori frame coordinates of all categories in the frame are stored 
                classIds.append(classId)  #  take frame The number corresponding to each category in （1-80）, It is convenient to output text , Corresponding to coconame Output the category name in the file 
                confs.append(float(confidence))  #  Yes frame The maximum suppression of each type of information identified by the parameter nms Threshold control 
    #  Yes frame The maximum suppression of each type of information identified by the parameter nms Threshold control 
    indices = cv2.dnn.NMSBoxes(bbox, confs, confThreshold, nmsThreshold)
    for i in indices:
        box = bbox[i]  #  Read the maximum known parameters in sequence nms A priori frame coordinates of the threshold 
        x, y, w, h = box[0], box[1], box[2], box[3]
        # print(x,y,w,h)
        #  Rectangular box selection shall be conducted for each final identified target 
        cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 255), 2)
        #  Corresponding coco.names The corresponding category name and similarity probability are output 
        cv2.putText(img, f'{
    classNames[classIds[i]].capitalize()} {
    int(confs[i] * 100)}%',
                    (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255, 0, 255), 2)

Complete code ：

import cv2
import numpy as np

#  Get the camera or video address 
cap = cv2.VideoCapture(r"./data/test.mp4")
#  Recognition confidence threshold 
confThreshold = 0.5
#  Maximum suppression value 
nmsThreshold = 0.2
#  The width and height of the network input image 
inpWidth = 320
inpHeight = 320
# coco.names The file stores 80 The name of a trained recognition type , And these category names are exactly the same as yolo What you're training 80 The categories correspond to each other 
classesFile = r"coco.names"
#  Storage type name list 
classNames = []
with open(classesFile, "rt") as f:
    #  Read data by row 
    classNames = f.read().splitlines()
#  Show all type names 
print(classNames)

#  To configure yolov3
modelConfiguration = "yolov3.cfg"  #  The configuration file 
modelWeights = "yolov3.weights"  #  Configure the weight file 
net = cv2.dnn.readNetFromDarknet(modelConfiguration, modelWeights)  #  Add the configuration file to dnn In the network 
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)  #  take DNN The back end is set to opencv
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)  #  take DNN Front end setting assembly cpu drive 


# yolov3 Detect and handle 
def findObjects(outputs, img):
    hT, wT, cT = img.shape  #  Get the size of the original frame image H,W
    bbox = []  #  Create a list of coordinates to store a priori boxes 
    classIds = []  #  Create and store the category information name detected in each frame 
    confs = []  #  Create confidence values read per frame 
    for output in outputs:  #  Traverse all categories 
        for det in output:  #  testing frame Each category in the frame 
            scores = det[5:]  #  Get the category and 80 The probability of similarity of all categories 
            classId = np.argmax(scores)  #  get 80 The most similar category in the （ The category with the largest similar probability value ） The subscript 
            confidence = scores[classId]  #  Get the value of the maximum similarity probability 
            if confidence > confThreshold:  #  Judge the similarity threshold 
                #  Get the four coordinate points of the a priori box 
                w, h = int(det[2] * wT), int(det[3] * hT)
                x, y = int((det[0] * wT) - w / 2), int((det[1] * hT) - h / 2)

                bbox.append([x, y, w, h])  #  Add coordinates to bbox For storage , Convenient for frame A priori frame coordinates of all categories in the frame are stored 
                classIds.append(classId)  #  take frame The number corresponding to each category in （1-80）, It is convenient to output text , Corresponding to coconame Output the category name in the file 
                confs.append(float(confidence))  #  Yes frame The maximum suppression of each type of information identified by the parameter nms Threshold control 
    #  Yes frame The maximum suppression of each type of information identified by the parameter nms Threshold control 
    indices = cv2.dnn.NMSBoxes(bbox, confs, confThreshold, nmsThreshold)
    for i in indices:
        box = bbox[i]  #  Read the maximum known parameters in sequence nms A priori frame coordinates of the threshold 
        x, y, w, h = box[0], box[1], box[2], box[3]
        # print(x,y,w,h)
        #  Rectangular box selection shall be conducted for each final identified target 
        cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 255), 2)
        #  Corresponding coco.names The corresponding category name and similarity probability are output 
        cv2.putText(img, f'{
    classNames[classIds[i]].capitalize()} {
    int(confs[i] * 100)}%',
                    (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255, 0, 255), 2)


while True:
    #  Reading data 
    success, frame = cap.read()
    # DNN The input image of the network needs to adopt a method called  blob  Specific format for 
    blob = cv2.dnn.blobFromImage(frame, 1 / 255, (inpWidth, inpHeight), [0, 0, 0], True, False)
    #  Will output blob As input to the incoming network 
    net.setInput(blob)
    #  Get the name of the input layer 
    layerNames = net.getLayerNames()
    #  Get the last layer of the input layer , To traverse the entire network 
    outputNames = [layerNames[i - 1] for i in net.getUnconnectedOutLayers()]
    outputs = net.forward(outputNames)
    findObjects(outputs, frame)
    #  Display images 
    cv2.imshow("img", frame)
    #  Wait for the button ESC
    if cv2.waitKey(10) == 27:
        break

#  Free memory 
cap.release()
cv2.destroyAllWindows()

result ：
Insert picture description here

summary

yolov3 There are many other functions for target detection , Share more interesting code in the future , This is a beginner's article , Basically, the official code has not been modified or added . Please forgive me .

原网站

版权声明
本文为[liu-Mr]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/173/202206221011399298.html