CN-116434314-B - Space gaze tracking method and device based on human eyeball model

CN116434314BCN 116434314 BCN116434314 BCN 116434314BCN-116434314-B

Abstract

The invention provides a space gaze tracking method and device based on a human eyeball model, which comprises the steps of obtaining a target image to be processed, generating an eye key point detection result and an eye region segmentation result according to the target image to be processed, inputting the eye key point detection result and the eye region segmentation result into a pre-constructed ellipse fitting model to obtain an ellipse fitting result, obtaining the rotation center of an eyeball according to the ellipse fitting result by utilizing a first preset rule based on a first preset formula to determine an optical axis parameter, and determining a visual axis according to the optical axis parameter by utilizing a second preset rule based on a second preset formula to determine the space position of a sight. The visual axis is obtained through four steps of eye key point detection, eye region segmentation, ellipse fitting and optical axis calculation on the basis of a new visual axis model, so that the spatial position of the sight line is determined, and the spatial gaze tracking with higher accuracy, lower error, lower cost and lighter weight is realized.

Inventors

HE YUAN
YANG SONGZHOU

Assignees

清华大学

Dates

Publication Date: 20260512
Application Date: 20230310

Claims (8)

1. A method of spatial gaze tracking based on a human eye model, comprising: Acquiring a target image to be processed; Generating an eye key point detection result and an eye region segmentation result according to the target image to be processed; The eye key point detection result is generated by inputting the target image to be processed into a pre-constructed eye key point detection model; the eye key point detection model is obtained by training a pre-constructed neural network by utilizing a pre-constructed data set, an eye region segmentation result is generated by inputting a classification result into the pre-constructed eye region segmentation model, and the classification result is obtained by classifying the target image to be processed according to pixels; Inputting the eye key point detection result and the eye region segmentation result into a pre-constructed ellipse fitting model to obtain an ellipse fitting result, wherein the ellipse fitting model is obtained by training by utilizing a pre-constructed data set based on a pre-constructed neural network; obtaining the rotation center of the eyeball based on a first preset formula by utilizing a first preset rule according to the ellipse fitting result so as to determine an optical axis parameter; determining a visual axis based on a second preset formula by utilizing a second preset rule according to the optical axis parameter, and further determining the space position of the sight; According to the ellipse fitting result, the rotation center of the eyeball is obtained based on a first preset formula by using a first preset rule so as to determine an optical axis parameter, and the method specifically comprises the following steps: determining directions of pupils and irises in a three-dimensional space by using the target parameters obtained by ellipse fitting and the effective focal length of the camera so as to obtain a first preset formula; calculating by using a first preset formula, establishing a rotation center of the eyeball, and connecting the center of the pupil and the rotation center of the eyeball to obtain an optical axis; the first preset formula includes: ; Wherein the method comprises the steps of , Coordinates of a plane where the pupil and the iris are located; The second preset formula includes: ; Wherein, the Representing a unit vector along the optical axis, Representing a unit vector along the visual axis, Representing a conversion vector; ; Wherein, the Representing the distance between the centers of rotation of the eyes, A first intermediate variable is represented by a first value, A second intermediate variable is represented by the formula, Representing the depth of view of the user, And Representing a visual axis; ; Wherein, the And Together indicating the position of the user's line of sight in space, Indicating the direction of the line of sight of the user, Indicating the rotation angle.
2. The human eye model-based spatial gaze tracking method of claim 1, wherein said eye region segmentation model specifically comprises: An encoder for learning identifiable features of an image; A decoder for mapping the features of the encoder to a high resolution pixel space to obtain a dense classification.
3. The human eyeball model-based spatial gaze tracking method of claim 1 wherein the eye keypoint detection model is trained from a pre-constructed data set based on a pre-constructed neural network and then further comprising: calculating a first loss by using a third preset formula; the third preset formula includes: ; Wherein, the A first loss function is represented and is used to represent, , Represents the maximum value of the abscissa of the eye key point detection result, Represents the maximum value of the ordinate of the detection result of the eye key points, Representing the minimum value of the abscissa of the eye key point detection result, Representing the minimum value of the ordinate of the detection result of the eye key points, Representing the difference between the predicted value and the ground truth value.
4. The human eye model-based spatial gaze tracking method of claim 2, wherein the eye region segmentation model is trained using a pre-constructed data set based on a pre-constructed neural network, and further comprising: carrying out weight calculation on each value in the output probability distribution vector by using a fourth preset formula; calculating a second loss by using a fifth preset formula; the fourth preset formula includes: ; Wherein, the Representing pixels Is used for the weight of the (c), Representing from pixel to the first The euclidean distance of the individual near boundaries, Representing a normal distribution standard deviation; the fifth preset formula includes: ; Wherein, the A second loss function is indicated and is indicated, Is a pixel Is used to predict the label of a (c) tag, Is a pixel Ground truth tag of (2).
5. The human eyeball model-based spatial gaze tracking method of claim 1 wherein the eye keypoint detection result and the eye region segmentation result are input to a pre-constructed ellipse fitting model to obtain an ellipse fitting result, specifically comprising: fitting the ellipses of the iris and the pupil by using the eye key point detection result and the eye region segmentation result as constraint conditions to obtain target parameters ; Generating based on the ellipse fitting model using the target parameters Iris and pupil ellipses; Wherein, the Is the abscissa of the central coordinate and, Is the ordinate of the center coordinate, Is the length of the half-spindle, Is the length of the half-minor axis, Is the rotation angle.
6. The human eye model-based spatial gaze tracking method of claim 1, wherein the ellipse fitting model is trained based on a pre-constructed neural network using a pre-constructed data set, and further comprising: Calculating a third loss by using a sixth preset formula; calculating the total loss by using a seventh preset formula; The sixth preset formula includes: ; Wherein, the A third loss function is indicated and is indicated, Representing the ground truth value, Representing the predicted value; the seventh preset formula includes: ; Wherein, the A first loss function is represented and is used to represent, A second loss function is indicated and is indicated, Representing the weight of the first loss function, Representing the weight of the second loss function, Representing a third loss function weight.
7. A human eye model-based spatial gaze tracking device, comprising: The image acquisition unit is used for acquiring an object image to be processed; The image processing unit is used for generating an eye key point detection result and an eye region segmentation result according to the target image to be processed; The eye key point detection result is generated by inputting the target image to be processed into a pre-constructed eye key point detection model; the eye key point detection model is obtained by training a pre-constructed neural network by utilizing a pre-constructed data set, an eye region segmentation result is generated by inputting a classification result into the pre-constructed eye region segmentation model, and the classification result is obtained by classifying the target image to be processed according to pixels; The ellipse fitting unit is used for inputting the eye key point detection result and the eye region segmentation result into a pre-constructed ellipse fitting model to obtain an ellipse fitting result, wherein the ellipse fitting model is obtained by training by utilizing a pre-constructed data set based on a pre-constructed neural network; the first calculation unit is used for obtaining the rotation center of the eyeball based on a first preset formula by utilizing a first preset rule according to the ellipse fitting result so as to determine an optical axis parameter; The second calculation unit is used for determining a visual axis based on a second preset formula by utilizing a second preset rule according to the optical axis parameter, so as to determine the space position of the visual line; the first computing unit specifically includes: determining directions of pupils and irises in a three-dimensional space by using the target parameters obtained by ellipse fitting and the effective focal length of the camera so as to obtain a first preset formula; calculating by using a first preset formula, establishing a rotation center of the eyeball, and connecting the center of the pupil and the rotation center of the eyeball to obtain an optical axis; the first preset formula includes: ; Wherein the method comprises the steps of , Coordinates of a plane where the pupil and the iris are located; The second preset formula includes: ; Wherein, the Representing a unit vector along the optical axis, Representing a unit vector along the visual axis, Representing a conversion vector; ; Wherein, the Representing the distance between the centers of rotation of the eyes, A first intermediate variable is represented by a first value, A second intermediate variable is represented by the formula, Representing the depth of view of the user, And Representing a visual axis; ; Wherein, the And Together indicating the position of the user's line of sight in space, Indicating the direction of the line of sight of the user, Indicating the rotation angle.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the human eye model-based spatial gaze tracking method of any of claims 1 to 6 when the program is executed.

Description

Space gaze tracking method and device based on human eyeball model Technical Field The invention relates to the technical field of human-computer interaction, in particular to a space gaze tracking method and device based on a human eyeball model. Background METAVERSE (meta universe) is attracting attention as a model of the next generation internet in terms of its popularity and immersive experience, METAVERSE contains 5G, AI, blockchain, content production, etc. various elements, the core of which is through virtual experience XR (Extended Reality). Thus, augmented reality (XR), which provides fusion and interaction between the physical world and the virtual world, is expected to become a bifurcation point connecting the metaspace and the physical world through 3D display and interaction. Although XR is common in various areas, such as health, industry, entertainment, 3D display and interaction remain the last mile problem in practice. The main problem behind 3D display and interaction is spatial gaze tracking, a key enabling technology of 3D display and interaction, which is not yet well adapted. Since spatial gaze is defined in terms of convergence of the binocular vision axis, as shown in fig. 1 (a) -1 (c), existing methods typically employ an approximate model, such as approximating the vision axis with other easily perceived axes (e.g., the pupil axis). However, in this way, accuracy is affected. As shown in fig. 2, different approximations may produce different errors for the same line of sight angle, particularly at the depths required for 3D displays, and some experiments have demonstrated depth perception errors even as high as 303.90 cm. To sum up, the existing spatial gaze tracking techniques have the disadvantage of low accuracy. Disclosure of Invention The invention provides a space gaze tracking method and device based on a human eyeball model, which are used for solving the defect of low space gaze tracking accuracy in the prior art and realizing space gaze tracking with higher accuracy and lower error. The invention provides a space gaze tracking method based on a human eyeball model, which comprises the following steps: Acquiring a target image to be processed; Generating an eye key point detection result and an eye region segmentation result according to the target image to be processed; The eye key point detection result is generated by inputting the target image to be processed into a pre-constructed eye key point detection model; the eye key point detection model is obtained by training a pre-constructed neural network by utilizing a pre-constructed data set, an eye region segmentation result is generated by inputting a classification result into the pre-constructed eye region segmentation model, and the classification result is obtained by classifying the target image to be processed according to pixels; Inputting the eye key point detection result and the eye region segmentation result into a pre-constructed ellipse fitting model to obtain an ellipse fitting result, wherein the ellipse fitting model is obtained by training by utilizing a pre-constructed data set based on a pre-constructed neural network; obtaining the rotation center of the eyeball based on a first preset formula by utilizing a first preset rule according to the ellipse fitting result so as to determine an optical axis parameter; and determining a visual axis based on a second preset formula by utilizing a second preset rule according to the optical axis parameter, so as to determine the space position of the sight. According to the spatial gaze tracking method based on the human eyeball model provided by the invention, the eye region segmentation model specifically comprises the following steps: An encoder for learning identifiable features of an image; A decoder for mapping the features of the encoder to a high resolution pixel space to obtain a dense classification. According to the spatial gaze tracking method based on the human eyeball model provided by the invention, the eye key point detection model is obtained by training a pre-constructed data set based on a pre-constructed neural network, and then the method further comprises the following steps: calculating a first loss by using a third preset formula; the third preset formula includes: where loss lm represents a first loss function, X max represents the maximum value of the abscissa of the eye key point detection result, y max represents the maximum value of the ordinate of the eye key point detection result, x min represents the minimum value of the abscissa of the eye key point detection result, y min represents the minimum value of the ordinate of the eye key point detection result, and o 1 represents the difference between the predicted value and the ground truth value. According to the spatial gaze tracking method based on the human eyeball model provided by the invention, the eye region segmentation model is obtained by training a pre-constructed d