EP-4250247-B1 - METHOD AND APPARATUS FOR DETERMINING A GAZE DIRECTION OF A USER

EP4250247B1EP 4250247 B1EP4250247 B1EP 4250247B1EP-4250247-B1

Inventors

Köhler, Thomas
ZAREI, Shahrooz
DEITSCH, Sergiu

Dates

Publication Date: 20260513
Application Date: 20230223

Claims (14)

A computer-implemented method for determining a gaze direction of a user (5), the method comprising: capturing (110) an image of a face of the user (5); providing (130) image data of the captured image to a neural network (220), wherein the neural network (220) is configured to detect an ellipse (20) representing an outer border of an iris (17) of the user's eye (15) and to output a first vector (31), a second vector (32) and a respective first and second probability (p 1 , p 2 ), wherein the first and second vector (31, 32) respectively correspond to one of two possible normals through a centre (21) of the ellipse (20), and wherein the first probability (p 1 ) represents the probability that the first vector (31) is the gaze direction and the second probability (p 2 ) represents the probability that the second vector (32) is the gaze direction; and determining (140) the gaze direction of the user (5) as the first vector (31), if the first probability (p 1 ) is greater than the second probability (p 2 ), and as the second vector (32), if the second probability (p 2 ) is greater than the first probability (p 1 ).
The method according to claim 1, further comprising: determining (120) an eye region (10) in the captured image, wherein providing (130) image data comprises providing (130) image data representing the determined eye region (10) to the neural network (220).
The method according to claim 1 or 2, wherein the neural network (220) is configured to determine geometric information (223) describing the ellipse (20) including at least one of a position or centre of the ellipse, a rotation (α) of the ellipse and a size of the semi-axes (S a , S b ) of the ellipse (20).
The method according to claim 3, wherein the neural network (220) comprises a plurality of basic layers (221) trained to output features forming a mathematical representation of the image data, and the neural network (220) further comprises at least one intermediate layer (222) configured to convert the mathematical representation of the image data into numerical values (223) defining the ellipse (20), wherein the numerical values include the at least one of a position or centre, a rotation (α) and a size of the semi-axes (S a , S b ) of the ellipse (20).
The method according to claim 4, wherein the neural network (220) is formed by supplementing the at least one intermediate layer (222) to a pre-trained neural network or by replacing at least one layer of a pre-trained neural network with the at least one intermediate layer (220).
The method according to one of claims 1 to 5, wherein each of the first and second vector (31, 32) is a three-dimensional vector having its origin in the centre (21) of the detected ellipse (20) and being perpendicular to a plane defined by the ellipse (20) in a three-dimensional space.
The method according to claim 6, wherein the neural network (220) is configured to calculate a roll and pitch of each of the first and second normal (31, 32) with respect to a coordinate system defined by an image plane (210a) of the captured image.
The method according to one of claims 1 to 7, wherein the neural network (220) further comprises a first neural network (221, 222) configured to receive the image data of the captured image, to detect the ellipse (20) representing an outer border of the iris (17), and to output geometric information (223) describing the ellipse (20), and a second neural network (224) configured to receive the geometric information describing the ellipse (20), and to output the first vector (31), the second vector (32), and the respective first and second probability (p1, p2).
The method according to claim 8, wherein the second neural network (224) is configured to classify the geometric information (223) into the first and second vector (31, 32) and the respective first and second probability (p1, p2), wherein, preferably, the first and second vector (31, 32) is directly calculated and the respective first and second probability (p1, p2) are trained.
The method according to one of claims 1 to 9, further comprising: outputting (135) geometric information (223) describing the ellipse (20).
The method according to one of claims 1 to 10, wherein the neural network (220) is further configured to detect another ellipse (20) representing an outer border of an iris (17) of a second eye of the user (5) and to determine the first and second probability (p 1 , p 2 ) at least partly based on spatial information of the other ellipse (20).
The method according to claim 11, wherein the neural network (220) comprises a plurality of basic layers (221) trained to output features forming a mathematical representation of the image data, and the neural network (220) further comprises at least one intermediate layer (222) configured to convert the mathematical representation of the image data into numerical values (223) defining the ellipse (20) and/or the other ellipse (20), wherein the numerical values include at least one of a position or centre, a rotation (α) and a size of the semi-axes (S a , S b ) of the ellipse (20) and/or the other ellipse (20).
An apparatus (200) comprising: a camera (210); a memory (205) configured to store computer-executable instructions for performing the method according to one of claims 1 to 12; and a processor (208, 240) configured to execute the instructions stored in the memory (205).
A vehicle (1) comprising an apparatus (200) according to claim 13.

Description

The present invention relates to a method and apparatus for determining a gaze direction of a user. Particularly, the present invention relates to a method and apparatus for determining a respective probability of a first and second normal of an ellipse representing an iris in image data of a face of the user. The estimation of a gaze direction becomes more and more important in various fields. For instance, human machine interfaces are under current development, such as allowing user input by "selecting" an item on a display by looking at the item. This allows user interaction without using an input device controlled by hand, for example, when operating machinery or supporting handicapped people. Estimating the gaze direction is of particular interest in the automotive industry, for instance when a gaze direction of the eyes of a driver is of interest. On the one hand, a human machine interface may be controlled by looking at a displayed item, a vehicle component may be controlled by looking at the component or a control device thereof, and a driver assistance system may operate based on information of a gaze direction, particularly if the driver looks at an object outside of the vehicle or in a rear mirror or the like. For instance, an automotive heads-up display may show important information to the driver in an area of the windshield, where the driver looks through, or the driver assistance system may warn the driver about a situation outside of the driver's field of view, or the like. Current systems for estimating a gaze direction of a user require information about the user. For example, such systems require knowledge about a relationship between the camera and the head of the user (such as distance, head pose with respect to an image plane of the camera, etc.), or such systems are user dependent, i.e. are calibrated for a particular user who has to log into the system. However, the determination of a gaze direction may fail. Particularly, if the required knowledge cannot be derived due to failure or erroneous information (such as missing or erroneous prior images of the eye) or an underlying module (head tracker, user authentication, etc.) provides a false output, the determination of the gaze direction will most likely not be correct. Thus, there are multiple sources of error, each of which can lead to a false gaze direction estimation. The following prior art documents are relevant: JIAN-GANG WANG ET AL: "Study on Eye Gaze Estimation", IEEE TRANS. ON SYSTEMS, MAN, AND CYBERNETICS, PART B, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 32, no. 3, 1 June 2002 (2002-06-01), XP011079908, ISSN: 1083-4419; WO 2020/110121 A1 (BLINK O G LTD [IL]; DROZDOV GILAD [IL]; WOLLNER URI [IL]) 4 June 2020 (2020-06-04). Therefore, it is an object of the present invention to provide a reliable and efficient method and apparatus for determining a gaze direction of a user. This object is solved by a method comprising the features of claim 1, an apparatus comprising the features of claim 13, and a vehicle comprising the features of claim 14. Preferred embodiments are defined by the dependent claims. According to a first aspect to better understand the present disclosure, a method for determining a gaze direction of a user comprises capturing an image of a face of the user, and providing image data of the captured image to a neural network. The neural network can be a convolutional neural network, such as a deep convolutional neural network. The neural network can be configured to receive the image data, for example, pixel values of the captured image, and perform (convolutional) operations on the pixel values. Particularly, the neural network is configured to detect an ellipse representing an outer border or contour of an iris of the user's eye. The outer border of the iris can be detected due to the change of colour and/or contrast between the iris and the sclera. The outer border of the iris in the real three-dimensional coordinate system is (almost) circular. However, in the majority of the cases, a two-dimensional image of the iris slightly deviates from a circle and forms an ellipse due to perspective distortion. The orientation of the iris (plane) with respect to an image plane of the camera (e.g., a sensor plane) is usually not perpendicular. Thus, the outer border of the iris projects (maps) onto pixels of the camera sensor that are arranged along an ellipse. The neural network is further configured to output a first vector, a second vector and a respective first and second probability, wherein the first and second vector respectively correspond to one of two possible normals through a centre of the ellipse, and wherein the first probability represents the probability that the first vector is the gaze direction, and the second probability represents the probability that the second vector is the gaze direction. Since only a two-dimensional image of the (almost) circular iris (arranged arbitrarily in the three-dime