EP-3936981-B1 - DATA PROCESSING APPARATUS AND METHOD

EP3936981B1EP 3936981 B1EP3936981 B1EP 3936981B1EP-3936981-B1

Inventors

CAPPELLO, FABIO
SMITH, Alexei Ashton Derek
MONTI, Maria Chiara

Dates

Publication Date: 20260513
Application Date: 20210610

Claims (14)

A data processing apparatus (1200), comprising: processing circuitry (1210) to generate at least one of video content and audio content for a virtual reality environment; input circuitry (1220) to receive gaze data for two or more users indicative of a gaze point for each user with respect to the virtual reality environment; and selection circuity (1230) to select at least one object in the virtual reality environment in dependence upon two or more of the gaze points corresponding to the object, in which the processing circuitry is configured to adapt the audio content in response to the selection of the object, wherein the processing circuitry is configured to generate the audio content for the virtual reality environment for a first user corresponding to a first avatar in the virtual reality environment in dependence upon a position of the first avatar, a position of the selected object and a position of one or more other objects not selected by the selection circuitry in the virtual reality environment.
The data processing apparatus according to claim 1, in which the processing circuitry is configured to increase a volume of an audio signal associated with the object in response to the selection of the object.
The data processing apparatus according to any preceding claim, in which the processing circuitry is configured to decrease a volume for one or more audio signals associated with one or more other objects in the virtual reality environment not selected by the selection circuitry, in response to the selection of the object.
The data processing apparatus according to claim 1, in which the processing circuitry is configured to calculate a weighting parameter for an audio signal associated with at least one of the other objects in the virtual reality environment not selected by the selection circuitry in dependence upon a distance between the position of the other object and the position of the first avatar.
The data processing apparatus according to claim 1 or claim 4, in which the processing circuitry is configured to select a first predetermined weighting parameter for one or more audio signals associated with one or more of the other objects in the virtual reality environment not selected by the selection circuitry and not within a predetermined distance of the position of the first avatar.
The data processing apparatus according to claim 1 or claim 4, in which the processing circuitry is configured to cull one or more audio signals associated with one or more of the other objects in the virtual reality environment not selected by the selection circuitry and not within a predetermined distance of the position of the first avatar.
The data processing apparatus according to any one of claims 1, 4 to 6 in which the processing circuitry is configured to increase a volume of an audio signal associated with the selected object by selecting a second predetermined weighting parameter for the selected object.
The data processing apparatus according to claim 7, in which the processing circuitry is configured to select the second predetermined weighting parameter from a plurality of second predetermined weighting parameters in dependence upon the number of the gaze points corresponding to the selected object.
The data processing apparatus according to any preceding claim, in which the selection circuity is configured to select the object in dependence upon whether the number of the gaze points corresponding to the object is greater than or equal to a threshold number of gaze points for a predetermined period of time.
The data processing apparatus according to claim 9, in which the threshold number of gaze points is dependent upon a number of respective users for the virtual environment, in which the input circuitry is configured to receive the gaze data for each of the respective users.
A system, comprising: the data processing apparatus (1200) according to any preceding claim; and a head-mountable display, HMD, (1250) configured to be worn by a user and to output at least one of the video content and the audio content generated by the processing circuitry.
The system according to claim 11, in which the input circuitry is configured to receive the gaze data from the HMD indicative of the gaze point for the user wearing the HMD, and comprising: another HMD configured to be worn by another user, in which the input circuitry is configured to receive the gaze data from the another HMD indicative of the gaze point for the another user wearing the another HMD.
A data processing method, comprising: generating (1410) at least one of video content and audio content for a virtual reality environment; receiving (1420) gaze data for two or more users indicative of gaze points with respect to the virtual reality environment; selecting (1430) at least one object in the virtual reality environment in dependence upon two or more of the gaze points corresponding to the object; and adapting (1440) the audio content in response to selecting the object, generating the audio content for the virtual reality environment for a first user corresponding to a first avatar in the virtual reality environment in dependence upon a position of the first avatar, a position of the selected object and a position of one or more other objects not selected by the selection circuitry in the virtual reality environment.
Computer software which, when executed by a computer, causes the computer to perform the method of claim 13.

Description

The present disclosure relates to apparatus and methods. In particular, the present disclosure relates to data processing apparatus and methods that use gaze data from gaze tracking systems to generate audio and/or video content. Gaze tracking systems are used to identify a location of a subject's gaze within an environment; in many cases, this location may be a position on a display screen that is being viewed by the subject. In a number of existing arrangements, this is performed using one or more inwards-facing cameras directed towards the subject's eye (or eyes) in order to determine a direction in which the eyes are oriented at any given time. Having identified the orientation of the eye, a gaze direction can be determined and a focal region may be determined as the intersection of the gaze direction of each eye. One application for which gaze tracking is considered of particular use is that of use in head-mountable display units (HMDs). The use in HMDs may be of particular benefit owing to the close proximity of inward-facing cameras to the user's eyes, allowing the tracking to be performed much more accurately and precisely than in arrangements in which it is not possibly to provide the cameras with such proximity. By utilising gaze detection techniques, it may be possible to provide a more efficient and/or effective processing method for generating content or interacting with devices. For example, gaze tracking may be used to provide user inputs or to assist with such inputs - a continued gaze at a location may act as a selection, or a gaze towards a particular object accompanied by another input (such as a button press) may be considered as a suitable input. This may be more effective as an input method in some embodiments, particularly in those in which a controller is not provided or when a user has limited mobility. Foveal rendering is an example of a use for the results of a gaze tracking process in order to improve the efficiency of a content generation process. Foveal rendering is rendering that is performed so as to exploit the fact that human vision is only able to identify high detail in a narrow region (the fovea), with the ability to discern detail tailing off sharply outside of this region. In such methods, a portion of the display is identified as being an area of focus in accordance with the user's gaze direction. This portion of the display is supplied with high-quality image content, while the remaining areas of the display are provided with lower-quality (and therefore less resource intensive to generate) image content. This can lead to a more efficient use of available processing resources without a noticeable degradation of image quality for the user. It is therefore considered advantageous to be able to improve gaze tracking methods, and/or apply the results of such methods in an improved manner. It is in the context of such advantages that the present disclosure arises. Other previously proposed arrangements are disclosed in US 2015/316982 A1 and M. Vinnikov et al: "Gaze-Contingent Auditory Displays for Imporved Spatial Attention in Virtual Reality", ACM Transactions on Computer-Human Interaction, vol. 24, no. 3, pages 1-38. Various aspects and features of the present invention are defined in the appended claims and within the text of the accompanying description. Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings, in which: Figure 1 schematically illustrates an HMD worn by a user;Figure 2 is a schematic plan view of an HMD;Figure 3 schematically illustrates the formation of a virtual image by an HMD;Figure 4 schematically illustrates another type of display for use in an HMD;Figure 5 schematically illustrates a pair of stereoscopic images;Figure 6a schematically illustrates a plan view of an HMD;Figure 6b schematically illustrates a near-eye tracking arrangement;Figure 7 schematically illustrates a remote tracking arrangement;Figure 8 schematically illustrates a gaze tracking environment;Figure 9 schematically illustrates a gaze tracking system;Figure 10 schematically illustrates a human eye;Figure 11 schematically illustrates a graph of human visual acuity;Figure 12a schematically illustrates a data processing apparatus;Figure 12b schematically illustrates a system;Figure 13 schematically illustrates an example of a virtual reality environment comprising a plurality of objects; andFigure 14 is a schematic flowchart illustrating a method for adapting at least one of video content and audio content for a virtual reality environment. Referring now to Figure 1, a user 10 is wearing an HMD 20 (as an example of a generic head-mountable apparatus - other examples including audio headphones or a head-mountable light source) on the user's head 30. The HMD comprises a frame 40, in this example formed of a rear strap and a top strap, and a display portion 50. As noted above, many gaze tracking arrangements may be consi