EP-4742654-A1 - ADAPTIVE FOVEATED IMAGE SENSORS FOR NEAR-EYE DEVICES

EP4742654A1EP 4742654 A1EP4742654 A1EP 4742654A1EP-4742654-A1

Abstract

Implementations for adaptive foveated image sensors are provided. One implementation includes an image sensor system for adaptive foveated processing, the image sensor system comprising: an image sensor (102) for generating (902)a set of signals by imaging an environment; and processing circuitry (104) configured to: receive (904) foveated region of interest (ROI) information and ambient light information; determine (906) a processing mode based on the ambient light information; compress (908) the set of signals from the image sensor based on the determined processing mode and the foveated ROI information to generate a compressed set of pixels; and output (912) the compressed set of pixels.

Inventors

LIU, SHENG
MENG, Xiaozhou
LI, YONGJUN

Assignees

Lemon Inc.

Dates

Publication Date: 20260513
Application Date: 20250930

Claims (15)

An image sensor system for adaptive foveated processing, the image sensor system comprising: an image sensor (102) for generating (902) a set of signals by imaging an environment; and processing circuitry (104) configured to: receive (904) foveated region of interest (ROI) information (202) and ambient light information (204); determine (906) a processing mode based on the ambient light information; compress (908) the set of signals from the image sensor based on the determined processing mode and the foveated ROI information to generate a compressed set of pixels; and output (912) the compressed set of pixels.
The image sensor system of claim 1, wherein the compressed set of pixels is output using a mobile industry processor interface (MIPI).
The image sensor system of claim 1, wherein the processing mode is determined further based on one or more of a predefined luminance threshold or a predefined spatial frequency threshold.
The image sensor system of claim 3, wherein the processing mode is determined further based on the predefined spatial frequency threshold for a foveated region of interest determined based on the foveated ROI information.
The image sensor system of claim 1, wherein the set of signals comprises signals for a plurality of frames, wherein the processing mode is determined on a per-frame basis, and wherein compressing the set of signals is performed on a per-frame basis.
The image sensor system of claim 1, wherein, upon determining the processing mode to be a low-resolution mode, compressing the set of signals comprises: performing one or more of an analog binning, a digital binning, an analog subsampling, or a digital subsampling to generate the compressed set of pixels.
The image sensor system of claim 1, wherein, upon determining the processing mode to be a foveated ROI mode, compressing the set of signals comprises: applying a foveation map to the set of signals to generate the compressed set of pixels, wherein the foveation map comprises: a full-resolution region determined using the foveated ROI information; and a compressed region different from the full resolution region.
The image sensor system of claim 7, wherein applying the foveation map comprises performing analog compression and performing digital compression after performance of the analog compression, wherein the analog compression comprises analog binning or analog subsampling, and wherein the digital compression comprises digital binning or digital subsampling.
The image sensor system of claim 1, wherein the foveated ROI information comprises coordinates describing one or more ROIs.
The image sensor system of claim 1, wherein the image sensor is implemented in a head-mounted display device.
A method (900) for adaptive foveated processing enacted on an image sensor system, the method comprising: generating (902) a set of signals by imaging an environment; receiving (904) foveated region of interest (ROI) information and ambient light information; determining (906) a processing mode based on the ambient light information; compressing (908) the set of signals based on the determined processing mode and the foveated ROI information to generate a compressed set of pixels; and outputting (912) the compressed set of pixels.
The method of claim 11, wherein the compressed set of pixels is output using a mobile industry processor interface (MIPI), preferably, wherein dummy data is added to the compressed set of pixels before output using the MIPI.
The method of claim 11, wherein the sampling mode is determined further based on one or more of a predefined luminance threshold or a predefined spatial frequency threshold.
The method of claim 11, wherein the set of signals comprises signals for a plurality of frames, wherein the processing mode is determined on a per-frame basis, and wherein compressing the set of signals is performed on a per-frame basis.
The method of claim 11, wherein: upon determining the processing mode to be a low-resolution mode, compressing the set of signals comprises: performing one or more of an analog binning, a digital binning, an analog subsampling, or a digital subsampling to generate the compressed set of pixels; and upon determining the processing mode to be a foveated ROI mode, compressing the set of signals comprises: applying a foveation map to the set of signals to generate the compressed set of pixels.

Description

BACKGROUND Near-eye devices are display devices, such as head-mounted displays, used in various augmented reality/mixed reality/virtual reality (AR/MR/VR) applications. These wearable devices utilize image generators and imaging optics for providing image content to the user's eyes. Different configurations of sensors and instruments enable various functionalities. For AR/MR applications, computer-generated content can be imposed onto the user's eyes, combining it with a real-world view through a transparent display. In VR applications, the device immerses the user in a virtual environment by projecting image content across the user's entire field-of-view, or a significant portion thereof. In some applications, the VR image content is computer generated and virtual. In video pass-through applications, a video feed of the user's surroundings is captured by mounted cameras and displayed to the user. Computer-generated content, either interactive or non-interactive, can also be overlaid on the video feed. SUMMARY Implementations for adaptive foveated image sensors are provided. One implementation includes an image sensor system for adaptive foveated processing, the image sensor system comprising: an image sensor for generating a set of signals by imaging an environment; and processing circuitry configured to: receive foveated region of interest (ROI) information and ambient light information; determine a processing mode based on the ambient light information; compress the set of signals from the image sensor based on the determined processing mode and the foveated ROI information to generate a compressed set of pixels; and output the compressed set of pixels. This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 shows a schematic view of an example near-eye device implementing an example adaptive foveated image sensor.FIG. 2 shows an example flow diagram for implementing different processing modes, which can be performed using the example near-eye device of FIG. 1.FIGS. 3A and 3B show example subsampling processes, which can be performed using the example near-eye device of FIG. 1.FIG. 4 shows an example binning process, which can be performed using the example near-eye device of FIG. 1.FIG. 5 shows an example mixed analog-digital sampling process for generating a foveation map, which can be performed using the example near-eye device of FIG. 1.FIG. 6 shows an example image with a foveation map applied, which can be performed using the example near-eye device of FIG. 1.FIG. 7 shows how an example foveation map is divided into three different virtual channels, which can be performed using the example near-eye device of FIG. 1.FIGS. 8A-8C show example processes for adding dummy data for different virtual channels, which can be performed using the example near-eye device of FIG. 1.FIG. 9 shows a flow diagram for an example method of adaptive foveated processing, which can be performed using the example near-eye device of FIG. 1.FIG. 10 shows a schematic view of an example computing system including the near-eye device of in FIG. 1.FIG. 11 shows an example form factor of the near-eye device of FIG. 1, in the form of AR/VR glasses. DETAILED DESCRIPTION Near-eye wearable devices implementing AR/MR/VR applications can provide image content to a user through different approaches for various applications, including consumer and industrial applications. With more advanced designs and added functionalities, the development of near-eye devices involves significant challenges in providing adequate computational power and power economy in limited form factors. For example, it can be desirable for near-eye devices to provide high-resolution image content, such as high-resolution computer-generated content, a high-resolution video feed of the user's environment, etc. In addition to processing and rendering high-resolution content, other functionalities such as eye-tracking technology can also put a demand on computational power. These needs often result in increased requirements of battery capacity, computational power, and thermal management, which can lead to bulkier designs with larger, heavier batteries. Current near-eye wearable devices face tradeoffs of achieving high image resolution, broad fields-of-view (FoV), low power consumption, and adequate computational processing power. One contemplated solution involves the intricacies of human perception. A typical human FoV is around 180 degrees. However, only a small portion of a person's vision is focused on at a