当前位置：网站首页>Literature reading: gopose 3D human pose estimation using WiFi

Literature reading: gopose 3D human pose estimation using WiFi

2022-07-24 19:12:00 【Gone forever communication er】

motivation ： Why does the author want to solve this problem ？

Previously based on Wi-Fi Of 3D Human posture estimation has the following defects ：
- It is only applicable to pose in a fixed position [1]
- Only predefined activities are allowed [2]

contribution ： What has the author done in this paper ( Innovation points )？

Challenge
- And USRP or FMCW RADAR Different , From ready-made Wi-Fi Channel state information exported by the device CSI Data does not provide any spatial information of the human body ( How to understand spatial information ？AoA、AoD And so on. )
- How to make the human posture estimation system independent of its operating environment ？
- How to 2D AoA Spectrum and human body 3D Modeling complex relationships between bones
Solution
- From the nonlinear spacer antenna 2D AoA spectrum , And the spatial diversity of the transmitter and Wi-Fi OFDM The frequency diversity of subcarriers is combined , In order to improve the 2D AoA The spatial resolution of , To distinguish signals reflected from different parts of the human body
- From the spectrum extracted when one or more users perform activities Minus the static environment Of 2D AoA spectrum
- 2D AoA Spectrum as input , be based on CNN and LSTM Infer human body 3D Posture .CNN Extract spatial features ,LSTM Extract temporal features
precision
- GoPose In all kinds of situations （ Including activities to track dark conditions ） and NLoS In this scenario, about 4.5 cm The accuracy of ( The accuracy is MPJPE?? Should be yes )

planning ： How they get the job done ？

The overall architecture

WiFi Probing: Collect data , utilize Linear fitting Denoise
Data Processing： First Space diversity and frequency diversity ( Later, we will introduce it in detail ) Combination , To improve two-dimensional AoA The resolution of the , To distinguish signals reflected from different parts of the human body ; then The static signal reflected from the indoor environment is filtered through static environment removal ; Last Combining multiple packets 2D AoA Spectrum as the input of the network
3D Pose Constrction：CNN It is used to capture the spatial features of human parts , and LSTM Used to estimate the temporal characteristics of motion
Improve two-dimensional AoA The resolution of the , Spatial diversity and frequency diversity
1D AoA It is estimated that there is not much elaboration , Is the use MUSIC Algorithm
2D AoA It is estimated that :
Use the L Shape antenna array to derive the azimuth of the incident signal $\varphi$ Elevation angle $\theta$ , See the paper for details of the formula 3.3

although 2D AoA Can provide the human body in 2D Approximate location in space , But it cannot distinguish signals reflected from different parts of the human body , For example, signals from the torso （ The signal $k_2$ ) Or from the legs （ The signal $k_3$ ）. This is because of commodities WiFi Hardware limitations of lead to 2D AoA The resolution of the spectrum is very low . To overcome this limitation , We further combine the spatial diversity of the transmitter (2D AoA,AoD) and WiFi OFDM Frequency diversity of subcarriers (ToF) To improve the 2D AoA Spectral resolution
The spatial diversity in the three transmit antennas will be affected by the deviation angle (AoD) And introduce phase shift , and OFDM Frequency diversity of subcarriers will result in relative time of flight (ToF) Phase shift of . therefore , We can use spatial and frequency diversity to jointly estimate 2D AoA、AoD and ToF, So as to significantly improve 2D AoA Spectral resolution ：
$\begin{aligned} \mathbf{a}^{\prime}(\varphi, \theta, \tau)=& {\left[1, \ldots, \Omega_{\tau}^{V-1}, \Phi_{(\varphi, \theta)}, \ldots, \Omega_{\tau}^{V-1} \Phi_{(\varphi, \theta)}, \ldots, \Phi_{(\varphi, \theta)}^{R-1}, \ldots, \Omega_{\tau}^{V-1} \Phi_{(\varphi, \theta)}^{R-1}\right]^{T} } \\ & \mathbf{a}(\varphi, \theta, \omega, \tau)=\left[\mathbf{a}_{(\varphi, \theta, \tau)}, \Gamma_{\omega} \mathbf{a}_{(\varphi, \theta, \tau)}^{\prime}, \ldots, \Gamma_{\omega}^{S-1} \mathbf{a}_{(\varphi, \theta, \tau)}\right]^{T} \end{aligned}$ $P(\varphi, \theta, \omega, \tau)_{\text {Improve }}=\frac{1}{\mathbf{a}^{H}(\varphi, \theta, \omega, \tau) \mathbf{E}_{N} \mathbf{E}_{N}^{H} \mathbf{a}(\varphi, \theta, \omega, \tau)}$
azimuth $\varphi$ 、 Elevation $\theta$ 、AoD $\omega$ 、ToF $\tau$
Static environment removal
because 2D AoA Spectrum provides spatial information of multipath signals , We can use this information to remove LoS Signals and signals reflected from static environments , In order to carry out environment independent 3D Attitude estimation . The way to do it is , Human activities 2D AoA Spectrum minus static environment 2D AoA spectrum .
Combine multiple packets ：
From a single WiFi Package exported 2D AoA The spectrum can only capture a small part of body motion , So a series of packets （100 A packet ） As the input of neural network to estimate human posture ：
neural network
Set the range of azimuth and elevation to [0, 180] degree , A resolution of 1 degree , The obtained size is 180×180 The spectrum of . System utilization 4 A receiver Capture users' actions from different angles , Connect the spectrum of the four receivers , The obtained size is 180 × 180 × 4 Tensor . In addition, we need to combine multiple spectra to capture whole-body motion . therefore , We'll take each receiver's 100 Connect packets , To form a 180 × 180 × 400 Matrix as input
neural network ,CNN It is used to capture the spatial features of human parts , and LSTM Used to estimate the temporal characteristics of motion

Loss function ：
$L_{P}=\frac{1}{T} \sum_{t=1}^{T} \frac{1}{N} \sum_{i=1}^{N}\left\|\bar{p}_{t}^{i}-p_{t}^{i}\right\|_{2},$ $L_{H}=\frac{1}{T} \sum_{t=1}^{T} \frac{1}{N} \sum_{i=1}^{N}\left\|\bar{p}_{t}^{i}-p_{t}^{i}\right\|_{H},$ $L=Q_{P} \cdot L_{P}+Q_{H} \cdot L_{H},$

reason ： What experiments are used to verify their working results

Experimental configuration
One engine and four receivers , Transmitter 3 The antenna , The receiver 3 The antenna (L Shape placement )
Contract awarding rate 1000Hz
Kinect2.0 Record ground truth( Can you record absolute posture ？？)
10 Personal data
The experimental site
A living room (4 × 4)、 The restaurant (3.6 × 3.6) And the bedroom (4 × 3.8)
Transceiver default distance 2.5 rice
Evaluation indicators
The joint positioning error is used as the evaluation index , Defined as the Euclidean distance between the predicted joint position and the ground reality . Please note that , assessment 14 A key point / The joints ( Whether it is aligned or not ？)
Overall performance
① NLOS Conditions ： Prove that the system can be used in LoS The deep learning model of training under conditions is applied to NLoS scene , Without retraining
② The impact of environmental change ： Used in an environment （ Such as living room or dining room ） To train the system , Then evaluate the system in different environments （ For example, the bedroom ） Performance of runtime in
③ Effect of distance between transceivers
④ The contracting rate affects
⑤ Different users ：7 Human training ,1 People verify ,2 Human test
⑥ Multi user impact ： Confirmatory experiments are accepted 2 Personal data , But it's no use