US-20260127711-A1 - MULTI-CAMERA PROCESSING IN EXTENDED REALITY (XR) SYSTEMS

US20260127711A1US 20260127711 A1US20260127711 A1US 20260127711A1US-20260127711-A1

Abstract

Systems and techniques are described for image processing. For example, a computing device can receive a first frame of a first view of a scene and a second frame of a second view of the scene. The computing device can determine a first portion of the second frame that corresponds to a portion of the first frame. The computing device can process the first frame and output the processed first frame. The computing device can process a second portion of the second frame that is different from the first portion of the second frame. The computing device can output a composite frame based on the processed second portion of the second frame and the portion of the first frame.

Inventors

Saurabh Ramesh Gangurde
Prasant Shekhar SINGH
Amrit Anand Amresh
Shrey Shailesh Gadiya
Saurabh Aggarwal
Abhijeet DEY
Varun Bansal

Assignees

QUALCOMM INCORPORATED

Dates

Publication Date: 20260507
Application Date: 20241105

Claims (20)

1 . An apparatus for image processing, the apparatus comprising: at least one memory; and at least one processor coupled to the at least one memory and configured to: receive a first frame of a first view of a scene and a second frame of a second view of the scene; determine a first portion of the second frame that corresponds to a portion of the first frame; process the first frame; output the processed first frame; process a second portion of the second frame that is different from the first portion of the second frame; and output a composite frame based on the processed second portion of the second frame and the portion of the first frame.
2 . The apparatus of claim 1 , wherein the at least one processor is configured to: transform the portion of the first frame to the second view corresponding to the second frame to generate a transformed portion of the first frame; and generate the composite frame based on the processed second portion of the second frame and the transformed portion of the first frame.
3 . The apparatus of claim 1 , wherein the at least one processor is configured to: determine a depth of a portion of the scene based on the first frame and the second frame; determine the portion of the scene is unoccluded in the first frame based on the first frame, the second frame, and the depth of the portion of the scene; based on a determination that the portion of the scene is unoccluded in the first frame, transform the portion of the scene from the first frame to the second view corresponding to the second frame to generate a transformed portion of the first frame; and generate the composite frame based on the processed second portion of the second frame and the transformed portion of the first frame.
4 . The apparatus of claim 3 , wherein the at least one processor is configured to determine the depth of the portion of the scene further based on at least one of time of flight data or depth sensor data.
5 . The apparatus of claim 3 , wherein the at least one processor is configured to transform the portion of the scene from the first frame to the second view corresponding to the second frame based on at least one of the depth of the portion of the scene or using a machine learning system.
6 . The apparatus of claim 3 , wherein the portion of the scene comprises one or more objects within the scene.
7 . The apparatus of claim 1 , wherein the first portion of the second frame overlaps with the portion of the first frame.
8 . The apparatus of claim 1 , wherein, to determine the first portion of the second frame that corresponds to the portion of the first frame, the at least one processor is configured to determine that the first portion of the second frame overlaps with the portion of the first frame.
9 . The apparatus of claim 8 , wherein the at least one processor is configured to determine that the first portion of the second frame overlaps with the portion of the first frame based on a depth of the scene.
10 . The apparatus of claim 1 , wherein the at least one processor is configured to: obtain, by a first image sensor with the first view of the scene, the first frame of the scene; and obtain, by a second image sensor with the second view of the scene, the second frame of the scene.
11 . The apparatus of claim 10 , wherein the first image sensor is a left eye image sensor of an extended reality (XR) headset, and the second image sensor is a right eye image sensor of the XR headset.
12 . The apparatus of claim 1 , wherein the at least one processor includes an image signal processor configured to process the first frame and to process the second portion of the second frame and a graphics processing unit (GPU) configured to process the processed first frame and the composite frame.
13 . A method for image processing, the method comprising: receiving a first frame of a first view of a scene and a second frame of a second view of the scene; determining a first portion of the second frame that corresponds to a portion of the first frame; processing the first frame; outputting the processed first frame; processing a second portion of the second frame that is different from the first portion of the second frame; and outputting a composite frame based on the processed second portion of the second frame and the portion of the first frame.
14 . The method of claim 13 , further comprising: transforming the portion of the first frame to the second view corresponding to the second frame to generate a transformed portion of the first frame; and generating the composite frame based on the processed second portion of the second frame and the transformed portion of the first frame.
15 . The method of claim 13 , further comprising: determining a depth of a portion of the scene based on the first frame and the second frame; determining the portion of the scene is unoccluded in the first frame based on the first frame, the second frame, and the depth of the portion of the scene; based on determining the portion of the scene is unoccluded in the first frame, transforming the portion of the scene from the first frame to the second view corresponding to the second frame to generate a transformed portion of the first frame; and generating the composite frame based on the processed second portion of the second frame and the transformed portion of the first frame.
16 . The method of claim 15 , wherein determining the depth of the portion of the scene is further based on at least one of time of flight data or depth sensor data.
17 . The method of claim 15 , wherein transforming the portion of the scene from the first frame to the second view corresponding to the second frame is based on at least one of the depth of the portion of the scene or using a machine learning system.
18 . The method of claim 13 , wherein the first portion of the second frame overlaps with the portion of the first frame.
19 . The method of claim 13 , wherein determining the first portion of the second frame that corresponds to the portion of the first frame comprises determining that the first portion of the second frame overlaps with the portion of the first frame.
20 . The method of claim 19 , wherein determining that the first portion of the second frame overlaps with the portion of the first frame is based on a depth of the scene.

Description

FIELD The present disclosure generally relates to image processing. For example, aspects of the present disclosure relate to a visual see through (VST) solution in extended reality (XR) devices. BACKGROUND Extended reality (XR) technologies can be used to present virtual content to users, and/or can combine real environments from the physical world and virtual environments to provide users with XR experiences. The term XR can encompass virtual reality (VR), augmented reality (AR), mixed reality (MR), and the like. XR systems can allow users to experience XR environments by overlaying virtual content onto images of a real-world environment, which can be viewed by a user through an XR device (e.g., a head-mounted display (HMD), extended reality glasses, or other device). For example, an XR device can display an environment to a user. In some cases, the environment may be at least partially different from the real-world environment in which the user is in. In some cases, such as in certain visual see-through (VST) systems or modes, the environment may be the same as the real-world environment in which the user is in. The user can generally change their view of the environment interactively, for example by tilting or moving the XR device (e.g., the HMD or other device). An XR system can include a “see-through” display that allows the user to see their real-world environment based on light from the real-world environment passing through the display. In some cases, an XR system can include a “pass-through” display that allows the user to see their real-world environment, or a virtual environment based on their real-world environment, based on a view of the environment being captured by one or more cameras and displayed on the display. “See-through” or “pass-through” XR systems can be worn by users while the users are engaged in activities in their real-world environment. In some cases, XR systems may be used to enhance experiences, such as for telepresence, gaming, metaverse, etc. Such technologies may allow a person to perform actions and/or have experiences, such as a collaborative and/or interactive experience with other persons, at a remote and/or virtual locations. In some cases, users may be represented in a virtual space as an animated avatar which may mimic movements and/or expressions of their representative user. A particular user may view the remote/virtual locations from a perspective of the avatar, for example, via an XR display device, such as a head mounted display (HMD) or mobile device. A precise reconstruction of a user's face for the avatar may allow for a more seamless, high quality, experience. In some cases, techniques for mesh estimation using HMD images may be useful. SUMMARY The following presents a simplified summary relating to one or more aspects disclosed herein. Thus, the following summary should not be considered an extensive overview relating to all contemplated aspects, nor should the following summary be considered to identify key or critical elements relating to all contemplated aspects or to delineate the scope associated with any particular aspect. Accordingly, the following summary has the sole purpose to present certain concepts relating to one or more aspects relating to the mechanisms disclosed herein in a simplified form to precede the detailed description presented below. Systems and techniques are described wherein for image processing. In some aspects, an apparatus for image processing is provided. The apparatus includes at least one memory and at least one processor coupled to the at least one memory and configured to: receive a first frame of a first view of a scene and a second frame of a second view of the scene; determine a first portion of the second frame that corresponds to a portion of the first frame; process the first frame; output the processed first frame; process a second portion of the second frame that is different from the first portion of the second frame; and output a composite frame based on the processed second portion of the second frame and the portion of the first frame. In some aspects, a method for image processing is provided. The method includes: receiving a first frame of a first view of a scene and a second frame of a second view of the scene; determining a first portion of the second frame that corresponds to a portion of the first frame; processing the first frame; outputting the processed first frame; processing a second portion of the second frame that is different from the first portion of the second frame; and outputting a composite frame based on the processed second portion of the second frame and the portion of the first frame. In some aspects, a non-transitory computer-readable medium having stored thereon instructions that, when executed by at least one processor, cause the at least one processor to: receive a first frame of a first view of a scene and a second frame of a second view of the scene; determine a first portion of th