当前位置:网站首页>Eye gaze estimation using webcam
Eye gaze estimation using webcam
2022-06-24 23:52:00 【woshicver】
Let's take a look at the following , You sit in the library , You just saw the most beautiful woman sitting on the other side of the library . Oh dear , She found you staring at her . She thinks your eyes are on her , And you point her eyes at you by understanding , Notice being caught by her .
Eye gaze : The focus of a person's eyes
Just like our amazing brain does many tasks effortlessly , This is a very difficult “ teach ” Computer problems , Because we need to carry out several arduous tasks :
Face recognition
Eye recognition and pupil location
Identify the head and eyes 3D location
Commercial gaze trackers come in a variety of shapes and sizes . Basic solutions from glasses to screens . however , Although these products are highly accurate , But they use proprietary software and hardware , And it's very expensive .
Let's start building our sight tracker
In order to keep the length of this blog reasonable , We will build a basic form of gaze tracking . There are a few rough estimates . And we don't know exactly what to look at , Instead, determine the direction of gaze .
Face recognition and pupil location
For this task , We will use MediaPipe(https://google.github.io/mediapipe/solutions/face_mesh.html), This is a result of Google Developed an amazing framework for deep learning , It will provide us with... In real time 468 individual 2D Face landmarks , And use very few resources .
Let's look at some code :
import mediapipe as mp
import cv2
import gaze
mp_face_mesh = mp.solutions.face_mesh # initialize the face mesh model
# camera stream:
cap = cv2.VideoCapture(1)
with mp_face_mesh.FaceMesh(
max_num_faces=1, # number of faces to track in each frame
refine_landmarks=True, # includes iris landmarks in the face mesh model
min_detection_confidence=0.5,
min_tracking_confidence=0.5) as face_mesh:
while cap.isOpened():
success, image = cap.read()
if not success: # no frame input
print("Ignoring empty camera frame.")
continue
# To improve performance, optionally mark the image as not writeable to
# pass by reference.
image.flags.writeable = False
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # frame to RGB for the face-mesh model
results = face_mesh.process(image)
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
if results.multi_face_landmarks:
gaze.gaze(image, results.multi_face_landmarks[0])
cv2.imshow('output window', image)
if cv2.waitKey(2) & 0xFF == 27:
break
cap.release()
There's nothing special here , In the 27 That's ok , We will start from mediapipe The current frame obtained by the frame and the facial landmarks are passed to us gaze function , That's all the fun .
2D turn 3D?
Line of sight tracking is a 3D problem , But we said in the title that we only used a simple webcam , How could it be ?
We will use some magic ( linear algebra ) To achieve it .
First , Let's see how our cameras work “ notice ” The of the world .

come from OpenCV Image of the document
When you look at the screen 2D The image is shown in blue ,3D The world is represented by the world coordinate system . What is the connection between them ? How do we get from 2D Image mapping 3D The world , Or at least get a rough estimate ?
Let's make it clear !
We are all the same
We humans are more similar than we think , We can use the universal face 3D Model , This will be for the majority of the population 3D A good estimate of the proportion .
Let's use this model to define a 3D Coordinate system , We set the tip of the nose as the origin of our coordinate system , Relative to it, we will redefine 5 A little bit , As shown below :
def gaze(frame, points):
'''
2D image points.
relative takes mediapipe points that normelized to [-1, 1] and returns image points
at (x,y) format
'''
image_points = np.array([
relative(points.landmark[4], frame.shape), # Nose tip
relative(points.landmark[152], frame.shape), # Chin
relative(points.landmark[263], frame.shape), # Left eye left corner
relative(points.landmark[33], frame.shape), # Right eye right corner
relative(points.landmark[287], frame.shape), # Left Mouth corner
relative(points.landmark[57], frame.shape) # Right mouth corner
], dtype="double")
# 3D model points.
model_points = np.array([
(0.0, 0.0, 0.0), # Nose tip
(0, -63.6, -12.5), # Chin
(-43.3, 32.7, -26), # Left eye left corner
(43.3, 32.7, -26), # Right eye right corner
(-28.9, -28.9, -24.1), # Left Mouth corner
(28.9, -28.9, -24.1) # Right mouth corner
])
'''
3D model eye points
The center of the eye ball
'''
Eye_ball_center_right = np.array([[-29.05],[32.7],[-39.5]])
Eye_ball_center_left = np.array([[29.05],[32.7],[-39.5]])
Now we have 6 From the mediapipe To obtain the 2D spot , And the corresponding in the world coordinate system we defined 3D spot . Our goal is to understand these points 3D The change of position , And by using our 2D Images to do this . What should we do ?
Pinhole camera model rescue
Pinhole camera model is a mathematical model , It describes the 3D The relationship between points in the world and their relationships in 2D Projection on the image plane . From this model , We will get the following equation :

Use this equation , We can get 3D Point projection to image 2D Transformation of image plane . But can we solve it ? ok , At least not through simple algebraic tools , But you don't have to worry , This is it. OpenCV Use solvePnP Where the function is , Please see the link for a more in-depth explanation :
https://docs.opencv.org/4.5.4/d9/d0c/group__calib3d.html#ga549c2075fac14829ff4a58bc931c033d
We will get our 6 Image points and corresponding 3D Model point , And pass them on to solvepnp function . In return , We will get a rotation and translation vector , So we get a transformation , This will help us move a point from 3D The world point is projected to 2D Plane .
Learn how to estimate the camera matrix here :
https://learnopencv.com/approximate-focal-length-for-webcams-and-cell-phone-cameras/
Or learn how to calibrate your own camera here :
https://docs.opencv.org/3.4/dc/dbb/tutorial_py_calibration.html
'''
camera matrix estimation
'''
focal_length = frame.shape[1]
center = (frame.shape[1] / 2, frame.shape[0] / 2)
camera_matrix = np.array(
[[focal_length, 0, center[0]],
[0, focal_length, center[1]],
[0, 0, 1]], dtype="double"
)
dist_coeffs = np.zeros((4, 1)) # Assuming no lens distortion
(success, rotation_vector, translation_vector) = cv2.solvePnP(model_points, image_points, camera_matrix,
dist_coeffs, flags=cv2.cv2.SOLVEPNP_ITERATIVE)
Use our new transformation , We can 3D Take a point out of space and project it onto 2D Image plane . therefore , We will learn about this 3D Where the point points in space . This is the point (0,0,150) The appearance of .
2D from 3D
Now we will get the pupil 2D Image coordinates and project them onto our 3D Model coordinates . It's just the opposite of what we did in the head pose estimation .
# project image point to world point
_ ,transformation, _ = cv2.estimateAffine3D(image_points1, model_points) # image cord to world cord tramsformation
pupil_world_cord = transformation @ np.array([[left_pupil[0],left_pupil[1],0,1]]).T # Transformation * pupil image point vector
As the code snippet shows , We will use OpenCV Estimation Affline3D function . This function uses the same principle as the pinhole camera model we discussed . It uses two groups 3D Point and return the conversion between the first group and the second group . But wait. , Our image points are two-dimensional , How could it be ?
ok , We will get the image points (x,y) And use them as (x,y,0) Pass on , Therefore, the conversion between image coordinates and model coordinates will be obtained . Using this method , We can learn from mediapipe Acquired 2D Image point acquisition pupil 3D Model point .
Be careful : This is not a very accurate estimate

I didn't tell you , But if you look at the second code snippet above , You can see that we have eye center model points (3D), We just used estimateAffline3D Got the pupil 3D Model point .
Now find the direction to look , We need to solve this line plane intersection problem , As shown above . What we are trying to find is useful S Express . Let's project a point onto 2D In plane .
# project pupil image point into world point
pupil_world_cord = transformation @ np.array([[left_pupil[0],left_pupil[1],0,1]]).T
# 3D gaze point (10 is arbitrary value denoting gaze distance)
S = Eye_ball_center_left + (pupil_world_cord - Eye_ball_center_left) * 10
# Project a 3D gaze point onto the image plane.
(eye_pupil2D, jacobian) = cv2.projectPoints((int(S[0]), int(S[1]), int(S[2])), rotation_vector,
translation_vector, camera_matrix, dist_coeffs)
# Draw gaze line into screen
p1 = (int(left_pupil[0]), int(left_pupil[1]))
p2 = (int(eye_pupil2D[0][0][0]) , int(eye_pupil2D[0][0][1]))
cv2.line(frame, p1, p2, (0, 0, 255), 2)
Be careful : In the 5 In line , We use “ Magic ” Count 10, This is because we don't know the distance between the subject and the camera . So in the figure t The distance between the pupil and the camera is unknown
Have you finished ?
Not yet . Now we need to think about head movements , such , Our eye tracker can adapt to the movement of the head . Let's use our head pose estimates from the beginning .

Pupillary 2D The position is determined by the point p Express , spot g It's watching + Head rotation projection , spot h It's a head pose projection . Now in order to get clean gaze information , Let's start with a vector A Construct vector in B.
# Project a 3D gaze direction onto the image plane.
(eye_pupil2D, _) = cv2.projectPoints((int(S[0]), int(S[1]), int(S[2])), rotation_vector,
translation_vector, camera_matrix, dist_coeffs)
# project 3D head pose into the image plane
(head_pose, _) = cv2.projectPoints((int(pupil_world_cord[0]), int(pupil_world_cord[1]), int(40)), rotation_vector,
translation_vector, camera_matrix, dist_coeffs)
# correct gaze for head rotation
gaze = left_pupil + (eye_pupil2D[0][0] - left_pupil) - (head_pose[0][0] - left_pupil)
In the 5 In line , We used “ Magic ” Count 40, The reason is that we used in the code snippet above 10 For the same reason .
end
We're done , At least for now . You can Github See the complete code on the page , And run it on your machine :
https://github.com/amitt1236/Gaze_estimation
But have we really finished ?
We can change something to improve accuracy :
Calibrate the camera correctly , Do not use estimates .
Use both eyes , Calculate the average value between the two positions .( We only used the left eye )
We are using estimateAffine3D Methods will 2d The pupil position is projected to 3d In the space , But this is not an accurate estimate . We can use the eye structure and the pupil position in the eye socket to infer the pupil position 3d Location .
We completely ignore the distance between the subject and the camera . Because of that , We only get a gaze direction, not a gaze point . It is probably the most important part , But it is also the most complicated part .
Through some work , You can implement your solution , And use it for your specific needs .
* END *
If you see this , Show that you like this article , Please forward 、 give the thumbs-up . WeChat search 「uncle_pn」, Welcome to add Xiaobian wechat 「 woshicver」, Update a high-quality blog post in your circle of friends every day .
↓ Scan QR code and add small code ↓
边栏推荐
- Hello C (IV) -- pointer and function
- Actipro WPF Controls 2022.1.2
- Morris traverse
- 怎么把wps表格里某一列有重复项的整行删掉
- 六大行数据治理现状盘点:治理架构、数据标准与数据中台(2022.04)
- Canvas spiral style animation JS special effect
- How to use stm32subeide SWV function
- 机器学习自学成才的十条戒律
- Analysis report on the development trend and Prospect of cetamide industry in the world and China from 2022 to 2028
- Investment analysis and prospect forecast report of global and Chinese propargyl chloride industry from 2022 to 2028
猜你喜欢
DO280OpenShift访问控制--加密和ConfigMap
Canvas spiral style animation JS special effect
js监听页面或元素scroll事件,滚动到底部或顶部
无鸟用的SAP PA证书,刚入行的同行可以考一考
Sword finger offer merges two sorted lists
Hello C (III) - pointer
Unveiling the secrets of the Winter Olympics | smartbi's partners supported the "front and back" of the Beijing Winter Olympics
Volcano becomes spark default batch scheduler
怎么把wps表格里某一列有重复项的整行删掉
Hello C (I) -- basics of C language
随机推荐
Uninstall hero League
无鸟用的SAP PA证书,刚入行的同行可以考一考
Installing IBM CPLEX academic edition | CONDA installing CPLEX
中学校园IP网络广播系统解决方案-校园数字IP广播系统方案设计指南
Tape SVG animation JS effect
7-6 laying oil well pipeline
The file containing the file operation vulnerability (6)
Tremblement de terre réel ~ projet associé unicloud
Tongji and Ali won the CVPR best student thesis, lifeifei won the Huang xutao award, and nearly 6000 people attended the offline conference
Collective例子
Leetcode topic [array] -39- combined sum
【Proteus仿真】定时器0作为16位计数器使用示例
Modify stm32f030 clock source to internal crystal oscillator (HEI)
Window系统安装Nacos
Number of bytes corresponding to common data types
Redis source code analysis skip list
svg线条动画背景js特效
Ultra vires vulnerability & Logic vulnerability (hot) (VIII)
DO280OpenShift访问控制--加密和ConfigMap
单调栈以及单调栈的应用