EP-3685313-B1 - PERSONALIZED NEURAL NETWORK FOR EYE TRACKING

EP3685313B1EP 3685313 B1EP3685313 B1EP 3685313B1EP-3685313-B1

Inventors

KAEHLER, ADRIAN
LEE, DOUGLAS
Badrinarayanan, Vijay

Dates

Publication Date: 20260506
Application Date: 20180918

Claims (13)

A wearable augmented reality display system (1100) comprising: an image capture device (1352) configured to capture a plurality of retraining eye images of an eye (200) of a user; a display (1108); non-transitory computer-readable storage medium configured to store: the plurality of retraining eye images, and a neural network (108) for eye tracking, wherein the neural network was trained using a training set comprising training input data and corresponding training target output data, wherein the training input data comprises a plurality of training eye images of a plurality of users, and wherein the corresponding training target output data comprises eye poses of eyes of the plurality of users in the plurality of training eye images; and a hardware processor in communication with the image capture device, the display, and the non-transitory computer-readable storage medium, the hardware processor programmed by the executable instructions to: receive the plurality of retraining eye images captured by the image capture device, wherein a retraining eye image of the plurality of retraining eye images is captured by the image capture device when a user interface, UI, event occurs with respect to a virtual UI device shown to a user as a three-dimensional virtual UI device at a display location of the display, wherein the hardware processor is configured to determine occurrences of UI events being user interactions with virtual UI devices; generate a retraining set comprising retraining input data and corresponding retraining target output data, wherein the retraining input data comprises the retraining eye images, wherein the corresponding retraining target output data comprises an eye pose of the eye of the user in the retraining eye image related to the display location, wherein the eye pose indicates a plurality of angular parameters relative to a natural resting direction of the eye and wherein the angular parameters indicate an azimuthal deflection and a zenithal deflection; train the neural network for eye tracking; and obtain a retrained neural network that is retrained from the neural network (108) for eye tracking using the retraining set.
The wearable display system (1100) of claim 1, wherein to obtain the retrained neural network, the hardware processor is programmed to at least: retrain the neural network for eye tracking using the retraining set to generate the retrained neural network.
The wearable display system (1100) of claim 1, wherein to obtain the retrained neural network, the hardware processor is programmed to at least: transmit the retraining set to a remote system; and receive the retrained neural network from the remote system, optionally wherein the remote system comprises a cloud computing system.
The wearable display system (1100) of claim 1, wherein the hardware processor is further programmed by the executable instructions to: determine the eye pose of the eye in the retraining eye image using the display location.
The wearable display system (1100) of claim 4, wherein the eye pose of the eye in the retraining image comprises the display location.
The wearable display system (1100) of claim 1, wherein to receive the plurality of retraining eye images of the user, the hardware processor is programmed by the executable instructions to at least: generate a second plurality of second retraining eye images based on the retraining eye image; and determine an eye pose of the eye in a second retraining eye image of the second plurality of second retraining eye images using the display location and a probability distribution function.
The wearable display system (1100) of claim 1, wherein to receive the plurality of retraining eye images of the user, the hardware processor is programmed by the executable instructions to at least: receive a plurality of eye images of the eye of the user from the image capture device, wherein a first eye image of the plurality of eye images is captured by the user device when the UI event, with respect to the virtual UI device shown to the user at the display location of the display, occurs; determine a projected display location of the virtual UI device from the display location, backward along a motion of the user prior to the UI event, to a beginning of the motion; determine the projected display location and a second display location of the virtual UI device in a second eye image of the plurality of eye images captured at the beginning of the motion are with a threshold distance; and generate the retraining input data comprising eye images of the plurality of eye images from the second eye image to the first eye image, wherein the corresponding retraining target output data comprises an eye pose of the eye of the user in each eye image of the eye images related to a display location of the virtual UI device in the eye image.
The wearable display system (1100) of claim 1, wherein the hardware processor is further programmed by the executable instructions to at least: determine the eye pose of the eye using the display location of the virtual UI device.
The wearable display system (1100) of claim 1, wherein to generate the retraining set, the hardware processor is programmed by the executable instructions to at least: determine the eye pose of the eye in the retraining eye image is in a first eye pose region of a plurality of eye pose regions; determine a distribution probability of the virtual UI device being in the first eye pose region; and generate the retraining input data comprising the retraining eye image at an inclusion probability related to the distribution probability.
The wearable display system (1100) of claim 1, wherein the retraining input data of the retraining set comprises at least one training eye image of the plurality of training eye images.
The wearable display system (1100) of claim 1, wherein the retraining input data of the retraining set comprises no training eye image of the plurality of training eye images.
The wearable display system (1100) of claim 1, wherein to retrain the neural network for eye tracking, the hardware processor is programmed by the executable instructions to at least: initialize weights of the retrained neural network with weights of the neural network.
The wearable display system (1100) of claim 1, wherein the hardware processor is programmed by the executable instructions to cause the user device to: receive an eye image of the user from the image capture device; and determine an eye pose of the user in the eye image using the retrained neural network.

Description

FIELD The present disclosure relates to virtual reality and augmented reality imaging and visualization systems and in particular to a personalized neural network for eye tracking. BACKGROUND A deep neural network (DNN) is a computation machine learning method. DNNs belong to a class of artificial neural networks (NN). With NNs, a computational graph is constructed which imitates the features of a biological neural network. The biological neural network includes features salient for computation and responsible for many of the capabilities of a biological system that may otherwise be difficult to capture through other methods. In some implementations, such networks are arranged into a sequential layered structure in which connections are unidirectional. For example, outputs of artificial neurons of a particular layer can be connected to inputs of artificial neurons of a subsequent layer. A DNN can be a NN with a large number of layers (e.g., 10s, 100s, or more layers). Different NNs are different from one another in different perspectives. For example, the topologies or architectures (e.g., the number of layers and how the layers are interconnected) and the weights of different NNs can be different. A weight can be approximately analogous to the synaptic strength of a neural connection in a biological system. Weights affect the strength of effects propagated from one layer to another. The output of an artificial neuron can be a nonlinear function of the weighted sum of its inputs. The weights of a NN can be the weights that appear in these summations. Anaelis et al. in "Adaptive eye-gaze tracking using neural-network-based user profiles to assist people with motor disability", Journal of Rehabilitation Research & Development, vol. 45, no. 6, pp 801-818 discloses an adaptive real-time human-computer interface that serves as an assistive technology tool for people with severe motor disability. The HCI uses eye gaze as the primary computer input device and adapts to each specific user's different characteristics through the training of an artificial neural network that is structured to reduce mouse jitter. The artificial neural network finds the relationship between recorded eye-gaze coordinates and mouse cursor position. Baluja et al. in "Non-intrusive Gaze Tracking Using Artificial Neural Networks" , CMU-CS-94-102 (available at https://apps.dtic.mil/sti/pdfs/ADA275186.pdf) discloses an artificial neural network based gaze tracking system which can be customized to individual users. A three layer feed forward network, trained with standard error back propagation, is used to determine the position of a user's gaze from the appearance of the user's eye. SUMMARY The invention is defined in claim 1. Further aspects and preferred embodiments are defined in the dependent claims. Any aspects, embodiments and examples of the present disclosure which do not fall under the scope of the appended claims do not form part of the invention and are merely provided for illustrative purposes. Details of one or more implementations of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages will become apparent from the description, the drawings, and the claims. Neither this summary nor the following detailed description purports to define or limit the scope of the subject matter of the disclosure. The embodiments relating to figures 1-9 and 11 and corresponding description passages are covered by the claims. However, the embodiments relating to the remaining figures and corresponding description passages are not covered by the claims but are useful for understanding the invention. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 schematically illustrates one embodiment of capturing eye images and using the eye images for retraining a neural network for eye tracking.FIG. 2 schematically illustrates an example of an eye. FIG. 2A schematically illustrates an example coordinate system for measuring an eye pose of an eye.FIG. 3 shows a flow diagram of an illustrative method of collecting eye images and retraining a neural network using the collected eye images.FIG. 4 illustrates an example of generating eye images with different eye poses for retraining a neural network for eye tracking.FIG. 5 illustrates an example of computing a probability distribution for generating eye images with different pointing directions for a virtual UI device displayed with an text description.FIG. 6 illustrates an example display of an augmented reality device with a number of regions of the display corresponding to different eye pose regions. A virtual UI device can be displayed in different regions of the display corresponding to different eye pose regions with different probabilities.FIG. 7 shows a flow diagram of an illustrative method of performing density normalization of UI events observed when collecting eye images for retraining a neural network.FIG