当前位置：网站首页>Eye gaze estimation using webcam

Eye gaze estimation using webcam

2022-06-24 23:52:00 【woshicver】

Let's take a look at the following , You sit in the library , You just saw the most beautiful woman sitting on the other side of the library . Oh dear , She found you staring at her . She thinks your eyes are on her , And you point her eyes at you by understanding , Notice being caught by her .

Eye gaze ： The focus of a person's eyes

Just like our amazing brain does many tasks effortlessly , This is a very difficult “ teach ” Computer problems , Because we need to carry out several arduous tasks ：

Face recognition
Eye recognition and pupil location
Identify the head and eyes 3D location

Commercial gaze trackers come in a variety of shapes and sizes . Basic solutions from glasses to screens . however , Although these products are highly accurate , But they use proprietary software and hardware , And it's very expensive .

Let's start building our sight tracker

In order to keep the length of this blog reasonable , We will build a basic form of gaze tracking . There are a few rough estimates . And we don't know exactly what to look at , Instead, determine the direction of gaze .

The gaze is relative to the lens , And I sit under the camera

Face recognition and pupil location

For this task , We will use MediaPipe(https://google.github.io/mediapipe/solutions/face_mesh.html), This is a result of Google Developed an amazing framework for deep learning , It will provide us with... In real time 468 individual 2D Face landmarks , And use very few resources .

Let's look at some code ：

import mediapipe as mp
import cv2
import gaze

mp_face_mesh = mp.solutions.face_mesh # initialize the face mesh model

# camera stream:
cap = cv2.VideoCapture(1)
with mp_face_mesh.FaceMesh(
        max_num_faces=1,                            # number of faces to track in each frame
        refine_landmarks=True,                      # includes iris landmarks in the face mesh model
        min_detection_confidence=0.5,
        min_tracking_confidence=0.5) as face_mesh:
    while cap.isOpened():
        success, image = cap.read()
        if not success:                            # no frame input
            print("Ignoring empty camera frame.")
            continue
        # To improve performance, optionally mark the image as not writeable to
        # pass by reference.
        image.flags.writeable = False
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # frame to RGB for the face-mesh model
        results = face_mesh.process(image)
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

        if results.multi_face_landmarks:
            gaze.gaze(image, results.multi_face_landmarks[0])

        cv2.imshow('output window', image)
        if cv2.waitKey(2) & 0xFF == 27:          
            break
cap.release()

There's nothing special here , In the 27 That's ok , We will start from mediapipe The current frame obtained by the frame and the facial landmarks are passed to us gaze function , That's all the fun .

2D turn 3D？

Line of sight tracking is a 3D problem , But we said in the title that we only used a simple webcam , How could it be ？

We will use some magic （ linear algebra ） To achieve it .

First , Let's see how our cameras work “ notice ” The of the world .

come from OpenCV Image of the document

When you look at the screen 2D The image is shown in blue ,3D The world is represented by the world coordinate system . What is the connection between them ？ How do we get from 2D Image mapping 3D The world , Or at least get a rough estimate ？

Let's make it clear ！

We are all the same

We humans are more similar than we think , We can use the universal face 3D Model , This will be for the majority of the population 3D A good estimate of the proportion .

Let's use this model to define a 3D Coordinate system , We set the tip of the nose as the origin of our coordinate system , Relative to it, we will redefine 5 A little bit , As shown below ：

def gaze(frame, points):
    '''
    2D image points.
    relative takes mediapipe points that normelized to [-1, 1] and returns image points
    at (x,y) format
    '''
    image_points = np.array([
        relative(points.landmark[4], frame.shape),    # Nose tip
        relative(points.landmark[152], frame.shape),  # Chin
        relative(points.landmark[263], frame.shape),  # Left eye left corner
        relative(points.landmark[33], frame.shape),   # Right eye right corner
        relative(points.landmark[287], frame.shape),  # Left Mouth corner
        relative(points.landmark[57], frame.shape)    # Right mouth corner
    ], dtype="double")

    # 3D model points.
    model_points = np.array([
        (0.0, 0.0, 0.0),       # Nose tip
        (0, -63.6, -12.5),     # Chin
        (-43.3, 32.7, -26),    # Left eye left corner
        (43.3, 32.7, -26),     # Right eye right corner
        (-28.9, -28.9, -24.1), # Left Mouth corner
        (28.9, -28.9, -24.1)   # Right mouth corner
    ])
    '''
    3D model eye points
    The center of the eye ball
    '''
    Eye_ball_center_right = np.array([[-29.05],[32.7],[-39.5]])
    Eye_ball_center_left = np.array([[29.05],[32.7],[-39.5]])

Now we have 6 From the mediapipe To obtain the 2D spot , And the corresponding in the world coordinate system we defined 3D spot . Our goal is to understand these points 3D The change of position , And by using our 2D Images to do this . What should we do ？

Pinhole camera model rescue

Pinhole camera model is a mathematical model , It describes the 3D The relationship between points in the world and their relationships in 2D Projection on the image plane . From this model , We will get the following equation ：

Use this equation , We can get 3D Point projection to image 2D Transformation of image plane . But can we solve it ？ ok , At least not through simple algebraic tools , But you don't have to worry , This is it. OpenCV Use solvePnP Where the function is , Please see the link for a more in-depth explanation ：

https://docs.opencv.org/4.5.4/d9/d0c/group__calib3d.html#ga549c2075fac14829ff4a58bc931c033d

We will get our 6 Image points and corresponding 3D Model point , And pass them on to solvepnp function . In return , We will get a rotation and translation vector , So we get a transformation , This will help us move a point from 3D The world point is projected to 2D Plane .

Learn how to estimate the camera matrix here ：
https://learnopencv.com/approximate-focal-length-for-webcams-and-cell-phone-cameras/
Or learn how to calibrate your own camera here ：
https://docs.opencv.org/3.4/dc/dbb/tutorial_py_calibration.html

'''
    camera matrix estimation
    '''
    focal_length = frame.shape[1]
    center = (frame.shape[1] / 2, frame.shape[0] / 2)
    camera_matrix = np.array(
        [[focal_length, 0, center[0]],
         [0, focal_length, center[1]],
         [0, 0, 1]], dtype="double"
    )

    dist_coeffs = np.zeros((4, 1))  # Assuming no lens distortion
    (success, rotation_vector, translation_vector) = cv2.solvePnP(model_points, image_points, camera_matrix,
                                                                  dist_coeffs, flags=cv2.cv2.SOLVEPNP_ITERATIVE)

Use our new transformation , We can 3D Take a point out of space and project it onto 2D Image plane . therefore , We will learn about this 3D Where the point points in space . This is the point (0,0,150) The appearance of .

2D from 3D

Now we will get the pupil 2D Image coordinates and project them onto our 3D Model coordinates . It's just the opposite of what we did in the head pose estimation .

# project image point to world point
_ ,transformation, _ = cv2.estimateAffine3D(image_points1, model_points) # image cord to world cord tramsformation
pupil_world_cord =  transformation @ np.array([[left_pupil[0],left_pupil[1],0,1]]).T # Transformation * pupil image point vector

As the code snippet shows , We will use OpenCV Estimation Affline3D function . This function uses the same principle as the pinhole camera model we discussed . It uses two groups 3D Point and return the conversion between the first group and the second group . But wait. , Our image points are two-dimensional , How could it be ？

ok , We will get the image points (x,y) And use them as (x,y,0) Pass on , Therefore, the conversion between image coordinates and model coordinates will be obtained . Using this method , We can learn from mediapipe Acquired 2D Image point acquisition pupil 3D Model point .

Be careful ： This is not a very accurate estimate

I didn't tell you , But if you look at the second code snippet above , You can see that we have eye center model points （3D）, We just used estimateAffline3D Got the pupil 3D Model point .

Now find the direction to look , We need to solve this line plane intersection problem , As shown above . What we are trying to find is useful S Express . Let's project a point onto 2D In plane .

# project pupil image point into world point 
    pupil_world_cord =  transformation @ np.array([[left_pupil[0],left_pupil[1],0,1]]).T
    
    # 3D gaze point (10 is arbitrary value denoting gaze distance)
    S = Eye_ball_center_left + (pupil_world_cord - Eye_ball_center_left) * 10
    
    # Project a 3D gaze point onto the image plane.
    (eye_pupil2D, jacobian) = cv2.projectPoints((int(S[0]), int(S[1]), int(S[2])), rotation_vector,
                                                    translation_vector, camera_matrix, dist_coeffs)
    # Draw gaze line into screen 
    p1  = (int(left_pupil[0]), int(left_pupil[1]))
    p2 = (int(eye_pupil2D[0][0][0]) , int(eye_pupil2D[0][0][1]))
    cv2.line(frame, p1, p2, (0, 0, 255), 2)

Be careful ： In the 5 In line , We use “ Magic ” Count 10, This is because we don't know the distance between the subject and the camera . So in the figure t The distance between the pupil and the camera is unknown

Have you finished ？

Not yet . Now we need to think about head movements , such , Our eye tracker can adapt to the movement of the head . Let's use our head pose estimates from the beginning .

Pupillary 2D The position is determined by the point p Express , spot g It's watching + Head rotation projection , spot h It's a head pose projection . Now in order to get clean gaze information , Let's start with a vector A Construct vector in B.

# Project a 3D gaze direction onto the image plane.
(eye_pupil2D, _) = cv2.projectPoints((int(S[0]), int(S[1]), int(S[2])), rotation_vector,
                                                translation_vector, camera_matrix, dist_coeffs)
# project 3D head pose into the image plane
(head_pose, _) = cv2.projectPoints((int(pupil_world_cord[0]), int(pupil_world_cord[1]), int(40)), rotation_vector,
                                                translation_vector, camera_matrix, dist_coeffs)

# correct gaze for head rotation
gaze = left_pupil + (eye_pupil2D[0][0] - left_pupil) - (head_pose[0][0] - left_pupil)

In the 5 In line , We used “ Magic ” Count 40, The reason is that we used in the code snippet above 10 For the same reason .

end

We're done , At least for now . You can Github See the complete code on the page , And run it on your machine ：

https://github.com/amitt1236/Gaze_estimation

But have we really finished ？

We can change something to improve accuracy ：

Calibrate the camera correctly , Do not use estimates .
Use both eyes , Calculate the average value between the two positions .（ We only used the left eye ）
We are using estimateAffine3D Methods will 2d The pupil position is projected to 3d In the space , But this is not an accurate estimate . We can use the eye structure and the pupil position in the eye socket to infer the pupil position 3d Location .
We completely ignore the distance between the subject and the camera . Because of that , We only get a gaze direction, not a gaze point . It is probably the most important part , But it is also the most complicated part .

Through some work , You can implement your solution , And use it for your specific needs .

* END *

If you see this , Show that you like this article , Please forward 、 give the thumbs-up . WeChat search 「uncle_pn」, Welcome to add Xiaobian wechat 「 woshicver」, Update a high-quality blog post in your circle of friends every day .

↓ Scan QR code and add small code ↓