EP-4289488-B1 - HYBRID PIXEL DYNAMIC VISION SENSOR TRACKING USING IR AND AMBIENT LIGHT (OR DEPTH SENSOR)

EP4289488B1EP 4289488 B1EP4289488 B1EP 4289488B1EP-4289488-B1

Inventors

YE, XIAOYONG
NAKAMURA, YUICHIRO

Dates

Publication Date: 20260513
Application Date: 20230512

Claims (13)

A tracking system, comprising a processor (110, 210, 310); a dynamic vision sensor, DVS, (101, 201, 202, 301, 401, 801, 901) operably coupled to the processor, the DVS having an array of light-sensitive elements in a known configuration, wherein the DVS is configured to output signals corresponding to two or more events at two or more corresponding light-sensitive elements in the array in response to changes in light output from two or more light sources (104, 105, 106, 107, 204, 205, 206, 304, 305, 306, 404, 405, 406, 407, 413, 414, 415, 416) in a known configuration with respect to each other and with respect to a controller body, wherein the output signals include information corresponding to times of the two or more events and locations of the two or more corresponding light-sensitive elements in the array; and one or more filters configured to selectively transmit light from the two or more light sources to one or more of the light-sensitive elements in the array and to selectively block other light from reaching the one or more of the light-sensitive elements in the array, wherein the one or more filters are further configured to selectively block the light from the two or more light sources from reaching a different one or more of the light-sensitive elements in the array and to selectively transmit the other light to the different one or more of the light-sensitive elements in the array; wherein the processor is configured to determine a position and orientation of the controller from the times of the two or more events, the locations of the two or more corresponding light-sensitive elements in the array, and the known configuration of the two or more light sources with respect to each other and with respect to the controller body; and wherein the processor is further configured to determine a position and orientation of one or more objects from signals generated by two or more light-sensitive elements resulting from the other light reaching the two or more light-sensitive elements, wherein the other light is ambient light from an environment and the signals generated by two or more light-sensitive elements are generated by two or more light-sensitive elements in the array.
The tracking system of claim 1, wherein the one or more filters include an infra-red, IR, pass filter on a first set of one or more IR light sensitive elements of the light sensitive elements in the array and an infra-red cut filter on a second set of one or more of the light sensitive elements in the array.
The tracking system of claim 1, wherein the one or more filters are configured to selectively transmit the light from the two or more light sources to the one or more of the light-sensitive elements in the array and to selectively block the other light from reaching the one or more of the light-sensitive elements in the array at certain times, and wherein the one or more filters are further configured to selectively block the light from the two or more light sources from reaching the different one or more of the light-sensitive elements in the array and to selectively transmit the other light to the different one or more of the light-sensitive elements in the array at other times.
The tracking system of claim 1, wherein the one or more objects include a controller (103, 207, 307), one or more extremities of a user, or a head-mounted display (102, 203, 303).
The tracking system of claim 1 further comprising a depth sensor having an array including two or more depth sensor elements coupled to the processor, wherein each of the one or more depth sensor elements is configured to generate a signal in response to the other light impinging on it.
The tracking system of claim 1, wherein the array of light sensitive elements include one or more depth sensors configured to detect a time of flight of at least one of the one or more light sources, and wherein the one or more depth sensor elements include one or more depth sensor elements arranged in a checkerboard pattern with one or more other light sensitive elements in the array, or wherein the one or more depth sensors are located on a first portion of the array of light-sensitive elements and one or more other light sensitive elements are located on a second portion of the array of light sensitive elements.
The tracking system of claim 1, wherein the DVS further comprises an array of depth sensing light-sensitive elements wherein the array of depth sensing elements is configured to detect a time of flight of light from at least one light source of the two or more light sources.
The tracking system of claim 1, wherein the one or more light sources includes an amplitude modulated light source, at least one vertical cavity emitting laser, or at least one edge-emitting laser.
The tracking system of claim 1, further comprising a beam splitter optically coupled between the one or more light sources and the array of light-sensitive elements.
The tracking system of claim 1, further comprising a microelectromechanical system, MEMS, mirror (1304) optically coupled between the one or more light sources and the array of light-sensitive elements.
The tracking system of claim 1, wherein the processor is configured to use a machine learning algorithm to determine the position and orientation of the one or more objects.
A method for tracking comprising; selectively transmitting light from two or more light sources (104, 105, 106, 107) to one or more light-sensitive elements in an array of a dynamic vision sensor, DVS, (101, 201, 202, 301, 401, 507, 606, 607, 707, 708, 801, 901) with one or more filters, the DVS having an array of light-sensitive elements in a known configuration, and selectively blocking other light from reaching the one or more of the light-sensitive elements in the array with the one or more filters, wherein the one or more filters are further configured to selectively block the light from the two or more light sources from reaching a different one or more of the light-sensitive elements in the array and to selectively transmit the other light to the different one or more of the light-sensitive elements in the array; outputting signals corresponding to two or more events at two or more corresponding light-sensitive elements in the array in response to changes in light output from the two or more light sources, which are in a known configuration with respect to each other and with respect tc a controller body, wherein the output signals include information corresponding to times of the two or more events and locations of the two or more corresponding light-sensitive elements in the array; determining a position and orientation of a controller from the times of the two or more events, the locations of the two or more corresponding light-sensitive elements in the array, and the known configuration of the two or more light sources with respect to each other and with respect to the controller body; and determining a position and orientation of one or more objects from signals generated by two or more light-sensitive elements resulting from the other light reaching the two or more light-sensitive elements, wherein the other light is ambient light from an environment and the signals generated by two or more light-sensitive elements are generated by two or more light-sensitive elements in the array.
A non-transitory machine-readable storage medium which stores computer software which, when executed by a computer, causes the computer to carry out the method of claim 12.

Description

FIELD OF THE INVENTION Aspects of the present disclosure relate to game controller tracking, specifically aspects of the present disclosure relate to game controller tracking using a dynamic vision sensor. BACKGROUND OF THE INVENTION Previously proposed arrangements are disclosed in WO 2021/021839 A1 and EP 3 944 059 A1. Modern Virtual Reality (VR) and Augmented Reality (AR) implementations rely on accurate and fast motion tracking for user interaction with the device. AR and VR often rely on information relating to the location and orientation of a controller relative to other objects. Many VR and AR implementations rely on a combination of inertial measurements taken by accelerometers or gyroscopes within a controller and visual detection of the controller by an external camera to determine the location and orientation of the controller. Some of the earliest implementations use infrared lights detected by an infrared camera with a defined detection radius on a game controller pointed at a screen. The camera takes images at a moderately fast rate of 200 frames per second and the location of the infrared lights are determined. The distance between the infrared lights is predetermined and from the relative location of the infrared lights in the camera image a position of the controller relative to the screen can be calculated. Accelerometers are sometimes also used to provide information on relative three-dimensional change in position or orientation of the controller. These prior implementations rely on a fixed position of a screen and a controller that is pointed towards the screen. In modern VR and AR implementations the Screens may be placed close to a user's face in a head mounted display that moves with the user. Thus, having an absolute light position (also referred to as a light house) becomes undesirable because the user must set up independent light house points that require extra set up time and limit the extent of the user's movement. Additionally, even the moderately fast frame rate of the infrared camera at 200 frames per second was not fast enough to provide smooth feedback for motion. Furthermore, this simplistic set up does not lend itself for use with more modern inside-out detection methods such as room mapping and hand detection. More recent implementations use a camera and accelerometer in conjunction with trained machine learning algorithms trained to detect hands, controllers and/or other body parts. For smooth motion detection a high frame rate camera must be used to generate image frames for body part/controller detection. This generates a large amount of data that must be processed quickly for a smooth update rate. Thus, expensive hardware must be used to process the frame data. Additionally, much of the frame data in each of the frames is discarded as unnecessary because it is not related to motion tracking. It is within this context that aspects of the present disclosure arise. BRIEF DESCRIPTION OF THE DRAWINGS The teachings of the present invention can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which: FIG. 1 is a diagram depicting an implementation of game controller tracking using a DVS with a single sensor array according to an aspect of the present disclosure.FIG. 2 is a diagram depicting an implementation of game controller tracking using a DVS with a dual sensor array according to an aspect of the present disclosure.FIG. 3 is a diagram depicting an implementation of game controller tracking using a combination DVS with a single sensor array and a camera according to an aspect of the present disclosure.FIG. 4 is a diagram showing DVS tracking movement of a game controller having two or more light sources according to an aspect of the present disclosure.FIG. 5 is a diagram depicting an implementation of head tracking or other device tracking using a controller having DVS with a single sensor array according to an aspect of the present disclosure.FIG. 6 is a diagram depicting an implementation of head tracking or other device tracking using a game controller having a DVS with dual sensor arrays according to an aspect of the present disclosure.FIG. 7 is a diagram showing an implementation of head tracking or other device tracking using a controller having a combination DVS with a single sensor array and camera according to an aspect of the present disclosure.FIG. 8 is a flow diagram depicting a method for motion tracking with a DVS using one or more light source and a light source configuration fitting model according to an aspect of the present disclosure.FIG. 9 is a flow diagram showing a method for motion tracking with a DVS using time stamped light source position information according to an aspect of the present disclosure.FIG. 10A is a diagram depicting the basic form of an RNN having a layer of nodes each of which is characterized by an activation function, one input weight, a recurrent hidden