当前位置:网站首页>OpenPose Basic Philosophy
OpenPose Basic Philosophy
2022-08-02 16:02:00 【zhangyu】
Introduction
OpenPose Human Pose Recognition Project is an open source library developed by Carnegie Mellon University (CMU) based on convolutional neural network and supervised learning and using caffe as the framework.It can realize pose estimation such as human motion, facial expression, finger movement and so on.Excellent robustness for single and multiplayer.It is the world's first real-time multi-person 2D pose estimation application based on deep learning, and instances based on it have sprung up.Human posture estimation technology has broad application prospects in the fields of sports fitness, motion acquisition, 3D fitting, public opinion monitoring, etc. The application that people are more familiar with is the Douyin embarrassing dance machine.
Highlights
Proposed Part Affinity Fields (PAFs), each pixel is a 2D vector, used to represent position and orientation information.Based on the detected joint points and joint connection areas, using the greedy inference algorithm, these joint points can be quickly mapped to different individuals.
OpenPose is an open source library based on convolutional neural network and supervised learning and written in the framework of caffe, which can realize the tracking of human facial expressions, torso, limbs and even fingers, not only for single people but also for multiple people.Has better robustness.It can be called the world's first real-time multi-person 2D pose estimation based on deep learning. It is a milestone in human-computer interaction and provides a high-quality information dimension for machines to understand people.
Process
- Enter an image, extract features through a convolutional network, and obtain a set of feature maps, which are then divided into two forks, and the CNN network is used to extract Part Confidence Maps and Part Affinity Fields respectively;
- After getting these two pieces of information, we use Bipartite Matching in graph theory to find the Part Association and connect the joint points of the same person. Due to the vector nature of the PAF itself, the generated bipartite matching is very correct, and finally merged into the overall skeleton of a person;
- Finally seeking Multi-Person Parsing based on PAFs—>Converting the Multi-person parsing problem into a graphs problem—>Hungarian Algorithm (Hungarian Algorithm)
(Hungarian Algorithm is an algorithm for partial graph matching, the core of the algorithm is to findAugmented path, which is an algorithm for finding the maximum matching of bipartite graphs using augmented paths.)
Convolutional Neural Networks

The network is divided into two branches, upper and lower, to predict key points respectivelyHeatmaps and paf maps.Each branch has t stages, representing more and more fine-tuned, and each stage will fuse feature maps.where ρ φ represents the network.During training, loss is generated at each stage to avoid vanishing gradients; only the output of the last layer is used for prediction.
- Highlight 1: PAF-Part Affinity Fields
PAF (Part Affinity Fields), part of the area affinity.It is responsible for encoding the 2D vector of limb position and orientation in the image domain.At the same time, use CMP (Part Detection Confidence Maps) to mark the confidence of each key point (the so-called "heat map").Through two branches, keypoint locations and their connections are jointly learned.Simultaneously infer these bottom-up detections and associations, using a greedy parsing algorithm, which can encode enough global context to obtain high-quality results at a fraction of the computational cost.In parallel, it basically achieves real-time, and the time-consuming is not strongly related to the number of people in the picture. - Highlight 2: High Robustness
CMU's data acquisition equipment, a closed ball, can collect human data from any angle.The big ball is inlaid with 480 VGA cameras+31 HD cameras+10 Kinect Ⅱ Sensors+5 DLP Projectors. And all of them are synchronized by hardware.Massive high-quality data enables robust human pose detection based only on 2D images. - Highlight 3: Landmarks ternary normalization
At the beginning, the human skeleton joints were done by people who recognize actions by behavior analysis, and the facial landmark extraction was done by the face recognition or beauty algorithm development team. The hand joints wereThe gesture recognition human-computer interaction team is doing it, which belongs to different subdivision directions.The CMU team has achieved good results in the recognition of human skeleton joints, so the face and hand are integrated into a unified graph, and the effect is also good.Face alignment and pose alignment are linked together, and according to the rigid body properties of the human head and the non-rigid body characteristics of the limbs, a set of caffe-based point estimation and diffusion models are designed, and a tree-like decision-making acceleration is established, based on which 3D background segmentation is added.technology.
Single Person Pose Estimation (Algorithmic Thought of CPM)
The large convolution kernel adopted by the CPM model to obtain a large receptive field, which is very effective for inferring occluded joints.The network structure is as follows: 
The flow of the entire algorithm is:
a) First, regress all the people appearing in the image, and return to the joint points of each person
b) Then remove the response to other people according to the center map
c) Finally, by repeating theThe predicted heatmap is refined to obtain the final result. When refining, the loss of the intermediate layer needs to be introduced, so as to ensure that the deeper network can still be trained without gradient dispersion or explosion.Gradually improve the accuracy of regression by coarse to fine.
Shortboard
Memory consumption
The amount of calculation is very large. In order to achieve real-time purposes, a high-parallel strategy is used.Based on cuda acceleration, it is very memory-intensive, and basically discourages machines with video memory below 4G (GTX 980ti+)
The monitoring effect of special scenes is poor
Low image resolution, motion blur, low brightness, dense detection targets, severe occlusion, incomplete targets, etc., the effect is not very ideal.
COCO model dataset key points



边栏推荐
猜你喜欢

LITESTAR 4D应用:室内植物照明模拟

Introduction to C language function parameter passing mode

【线程】 理解线程(并行)线程同步的处理(信号量,互斥锁,读写锁,条件变量)

2. Log out, log in state examination, verification code

px和em和rem的区别

idea同时修改相同单词

Optisystem应用:光电检测器灵敏度建模

第二十五章:一文掌握while循环

Detailed introduction to the hierarchical method of binary tree creation

剑指offer:合并两个排序的链表
随机推荐
Ubuntu通过apt安装Mysql
光波导k域布局可视化(“神奇的圆环”)
三方对接接口数据安全问题
Windows下mysql服务无法启动:服务没有报告任何错误。
VirtualLab Fusion中的可视化设置
【进程间通信】信号量的使用/共享内存
OpenPose 基本理念
tpproxy-tcp透明代理
Qt | 显示网络图片 QNetworkAccessManager
戴森球计划这个游戏牛逼
2021-06-06
C#高级教程
golang gc垃圾回收
使用三个线程,按顺序打印X,Y,Z,连续打印10次
基类和派生类的关系【继承】/多态和虚函数/【继承和多态】抽象类和简单工厂
Unity Line-Renderer
OpenPose 命令行说明
学习笔记(01):activiti6.0从入门到精通-工作流的介绍以及插件的安装
idea同时修改相同单词
第二十六章:二维数组