EP-4738823-A1 - UPDATES OF BACKGROUND LAYERS DURING ENCODING

EP4738823A1EP 4738823 A1EP4738823 A1EP 4738823A1EP-4738823-A1

Abstract

There is provided techniques for updating background layers during encoding of scene, performed by an image processing device. The scene is encoded based on classifying objects depicted in the scene as either foreground or background. The background is divided into ordered background layers where each background layer is associated with a respective depth model. The method comprises detecting (S102) a change in an image portion of one background layer. The method comprises calculating (S104) a difference between the image portion and a corresponding image portion of a background layer ordered behind said one background layer. The method comprises selecting (S106a), when the difference is smaller than a threshold, said background layer ordered behind said one background layer to represent the image portion.

Inventors

STENING, JOHAN
YUAN, Song

Assignees

Axis AB

Dates

Publication Date: 20260506
Application Date: 20241104

Claims (13)

A method for updating background layers (410a:410c, 510a:510c) during encoding of scene (120), performed by an image processing device (no, 600), wherein the scene (120) is encoded based on classifying objects (420, 520) depicted in the scene (120) as either foreground (210) or background (220), wherein the background (220) is divided into ordered background layers (410a:410c, 510a:510c) where each background layer (410a:410c, 510a:510c) is associated with a respective depth model (dm), and wherein the method comprises: detecting (S102) a change in an image portion (225) of one background layer; calculating (S104) a difference between the image portion (225) and a corresponding image portion (225) of a background layer ordered behind said one background layer; and selecting (Sio6a), when the difference is smaller than a threshold, said background layer ordered behind said one background layer to represent the image portion (225).
The method according to claim 1, wherein the method further comprises: creating (S106b), when the difference is not smaller than the threshold, a new background layer to represent the image portion (225).
The method according to claim 2, wherein the method further comprises: merging (S108) the new background layer with the background layer ordered behind the new background layer upon expiration of a merge time (mt) associated with the background layer ordered behind the new background layer.
The method according to any preceding claim, wherein each background layer is associated with a respective merge time (mt).
The method according to claim 4, wherein representation of an object (420, 520) in the scene (120) is merged into a given background layer when the object (420, 520) remains stationary in the scene (120) longer than the merge time (mt) of said given background layer.
The method according to a combination of claims 3, 4, or 5 wherein a given background layer is merged into the background layer ordered behind it after having existed longer than the merge time (mt) of said background layer ordered behind it.
The method according to any preceding claim, wherein the background layers (410a:410c, 510a:510c) are ordered by merge time (mt), with the background layer with shortest merge time (mt) closest to the foreground (210).
The method according to any preceding claim, wherein the image portion (225) has depth values as given by the depth model (dm) of the background layer (410a:410c, 510a:510c) that represents the image portion (225).
The method according to any preceding claim, wherein the method further comprises: encoding (S110) the foreground (210) and the background layers (410a:410c, 510a:510c) into an encoded video stream of the scene (120).
An image processing device (no, 600) for updating background layers (410a:410c, 510a:510c) during encoding of scene (120), wherein the scene (120) is encoded based on classifying objects (420, 520) depicted in the scene (120) as either foreground (210) or background (220), wherein the background (220) is divided into ordered background layers (410a:410c, 510a:510c) where each background layer (410a:410c, 510a:510c) is associated with a respective depth model (dm), the image processing device (no, 600) comprising processing circuitry (610), the processing circuitry being configured to cause the image processing device (no, 600) to: detect a change in an image portion (225) of one background layer; calculate a difference between the image portion (225) and a corresponding image portion (225) of a background layer ordered behind said one background layer; and select, when the difference is smaller than a threshold, said background layer ordered behind said one background layer to represent the image portion (225).
The image processing device (no, 600) according to claim 10, further being configured to perform the method according to any of claims 2 to 9.
A computer program (720) for updating background layers (410a:410c, 510a:510c) during encoding of scene (120), wherein the scene (120) is encoded based on classifying objects (420, 520) depicted in the scene (120) as either foreground (210) or background (220), wherein the background (220) is divided into ordered background layers (410a:410c, 510a:510c) where each background layer (410a:410c, 510a:510c) is associated with a respective depth model (dm), the computer program comprising computer code which, when run on processing circuitry (610) of an image processing device (no, 600), causes the image processing device (no, 600) to: detect (S102) a change in an image portion (225) of one background layer; calculate (S104) a difference between the image portion (225) and a corresponding image portion (225) of a background layer ordered behind said one background layer; and select (S106a), when the difference is smaller than a threshold, said background layer ordered behind said one background layer to represent the image portion (225).
A computer program product (710) comprising a computer program (720) according to claim 12, and a computer readable storage medium (730) on which the computer program is stored.

Description

TECHNICAL FIELD Embodiments presented herein relate to a method, an image processing device, a computer program, and a computer program product for updating background layers during encoding of scene. BACKGROUND Depth perception is an essential aspect of understanding and interpreting the surrounding environment in various fields, particularly in applications where three-dimensional (3D) spatial information is required. The ability to accurately determine the distance, shape, and size of objects within a scene enables more precise analytics, improved object detection, and enhanced decision-making capabilities. Depth information is particularly useful in environments where distinguishing between objects based on their distance or size is critical, such as security systems, robotics, autonomous vehicles, and other systems that rely on visual data. Traditionally, depth perception has been facilitated by 3D cameras or other specialized sensors, which provide a detailed understanding of the environment. The ability to perceive depth offers several benefits, including more accurate object detection and the ability to reduce false alarms by filtering out objects that may appear larger or closer than they actually are. For example, depth perception can help in situations where objects might be misidentified based on two-dimensional (2D) images, as depth information provides a more comprehensive view of the actual spatial relationships within a scene. There are several methods by which depth information can be extracted. In some cases, a monocular camera system may utilize advanced computational models to estimate the depth from a single viewpoint. In other instances, depth information is derived from disparity measurements obtained from overlapping images captured by multiple sensors, such as those used in multi-sensor panoramic systems. These systems calculate the difference in the position of objects between images, enabling the determination of their relative distance. Another method involves sampling data from laser points, such as those used in Pan-Tilt-Zoom (PTZ) cameras equipped with lasers, to measure the distance of objects. Additionally, self-learning techniques based on object tracking can provide depth information by analyzing how objects move and change position over time. While these approaches can be effective in relatively static environments, they face challenges when applied to more dynamic scenes. Each method typically requires a certain amount of time to process the data and compute accurate depth information. For example, monocular models often involve intensive computational processing, while systems relying on PTZ cameras may require time for the camera to physically sweep or pan across the scene to gather sufficient data. This delay can hinder the ability to provide real-time or near real-time depth updates, especially in scenarios where large objects are moving quickly, causing rapid changes in their depth. In dynamic environments, where objects may move unpredictably or at varying speeds, keeping the depth perception system updated in real-time therefore becomes increasingly difficult. This challenge is particularly pronounced when large objects shift dramatically in depth, as the system may not be able to adjust quickly enough to provide accurate and up-to-date information. Consequently, there is a need for more efficient methods of maintaining accurate depth perception, particularly in situations where both static and dynamic elements are present within a scene. SUMMARY An object of embodiments herein is to address the above issues. A particular object is to provide computationally efficient techniques for maintaining accurate depth perception in scenes with both static and dynamic elements. According to a first aspect there is presented a method for updating background layers during encoding of scene, performed by an image processing device. The scene is encoded based on classifying objects depicted in the scene as either foreground or background. The background is divided into ordered background layers where each background layer is associated with a respective depth model. The method comprises detecting a change in an image portion of one background layer. The method comprises calculating a difference between the image portion and a corresponding image portion of a background layer ordered behind said one background layer. The method comprises selecting, when the difference is smaller than a threshold, said background layer ordered behind said one background layer to represent the image portion. According to a second aspect there is presented an image processing device for updating background layers during encoding of scene. The scene is encoded based on classifying objects depicted in the scene as either foreground or background. The background is divided into ordered background layers where each background layer is associated with a respective depth model. The image processing devic