EP-4154511-B1 - MAINTAINING FIXED SIZES FOR TARGET OBJECTS IN FRAMES

EP4154511B1EP 4154511 B1EP4154511 B1EP 4154511B1EP-4154511-B1

Inventors

MAO, SONGAN
HUH, Youngmin
SHAHRIAN VARNOUSFADERANI, Ehsan
CHOURASIA, Ajit
GOSNELL, Donald
LI, MUHUA
MAMEDOV, Denis

Dates

Publication Date: 20260506
Application Date: 20210426

Claims (14)

A method of processing one or more frames (302A), the method comprising: determining a region of interest (1004) in a first frame (302A) of a sequence of frames (302A, 302N), the region of interest (1004) in the first frame (302A) including an object having a size in the first frame (302A); determining a point of a first object region generated for the object in the first frame (302A); determining a point of a second object region generated for the object in a second frame (1032), the second frame (1032) occurring after the first frame (302A) in the sequence of frames (302A, 302N); determining a movement factor for the object based on a smoothing function (1400) using the point of the first object region and the point of the second object region, wherein the smoothing function (1400) controls a change in position of the object in a plurality of frames of the sequence of frames (302A, 302N) so that the change does not exceed a threshold position change in the plurality of frames of the sequence of frames; cropping a portion (1022) of the second frame (1032) of the sequence of frames (302A, 302N) based on the movement factor; and scaling the portion (1022) of the second frame (1032) based on the size of the object in the first frame (302A).
The method of claim 1, further comprising: receiving user input corresponding to a selection of the object in the first frame; and determining the region of interest in the first frame based on the received user input.
The method of claim 2, wherein the user input includes a touch input provided using a touch interface of a device.
The method of claim 1, further comprising: determining a point of the second object region determined for the object in the second frame; and cropping and scaling the portion of the second frame with the point of the second object region in a center of the cropped and scaled portion.
The method of claim 4, wherein the point of the second object region is a center point of the second object region.
The method of claim 1, wherein scaling the portion of the second frame based on the size of the object in the first frame causes the object in the second frame to have a same size as the object in the first frame.
The method of claim 1, further comprising: determining a first length associated with the object in the first frame; determining a second length associated with the object in the second frame; determining a scaling factor based on a comparison between the first length and the second length; and scaling the portion of the second frame based on the scaling factor.
The method of claim 7, wherein the first length is a length of the first object region determined for the object in the first frame, and wherein the second length is the length of a second object region determined for the object in the second frame.
The method of claim 8, wherein the first object region is a first bounding box and the first length is a diagonal length of the first bounding box, and wherein the second object region is a second bounding box and the second length is a diagonal length of the second bounding box.
The method of claim 8, wherein scaling the portion of the second frame based on the scaling factor causes the second object region in the cropped and scaled portion to have a same size as the first object region in the first frame.
The method of claim 1, wherein the smoothing function includes a moving function, the moving function being used to determine a location of the point of the respective object region in each of the plurality of frames of the sequence of frames based on a statistical measure of object movement.
The method of claim 1, further comprising: determining a first length associated with the object in the first frame; determining a second length associated with the object in the second frame; determining a scaling factor for the object based on a comparison between the first length and the second length and based on a smoothing function using the first length and the second length, wherein the smoothing function controls a change in size of the object in a plurality of frames of the sequence of frames; and scaling the portion of the second frame based on the scaling factor, and optionally wherein: the smoothing function includes a moving function, the moving function being used to determine a length associated with the object in each of the plurality of frames of the sequence of frames based on a statistical measure of object size; or the first length is a length of a first bounding box generated for the object in the first frame, and wherein the second length is a length of a second bounding box generated for the object in the second frame.
An apparatus (100) for processing one or more frames (302A), comprising: a memory (145) configured to store at least one frame (302A); and a processor (150) implemented in circuitry and configured to: determine a region of interest (1004) in a first frame (302A) of a sequence of frames (302A, 302N), the region of interest (1004) in the first frame (302A) including an object having a size in the first frame (302A); determine a point of a first object region generated for the object in the first frame (302A); determine a point of a second object region generated for the object in a second frame (1032), the second frame (1032) occurring after the first frame (302A) in the sequence of frames (302A, 302N); determine a movement factor for the object based on a smoothing function (1400) using the point of the first object region and the point of the second object region, wherein the smoothing function (1400) controls a change in position of the object in a plurality of frames of the sequence of frames (302A, 302N) so that the change does not exceed a threshold position change in the plurality of frames of the sequence of frames; crop a portion (1022) of the second frame (1032) of the sequence of frames (302A, 302N) based on the movement factor,; and scale the portion (1022) of the second frame (1032) based on the size of the object in the first frame (302A).
A computer program comprising instructions which, when the program is executed by a computing device, cause the computing device to carries out the method of any one of claims 1 to 12.

Description

FIELD The present disclosure generally relates to video analytics, and more specifically to techniques and systems for maintaining a consistent (e.g., fixed or nearly fixed) size for a target object in one or more frames (e.g., in video analytics, for recorded video, among other uses). BACKGROUND Many devices and systems allow a scene to be captured by generating images (or frames) and/or video data (including multiple frames) of the scene. For example, a camera or a computing device including a camera (e.g., a mobile device such as a mobile telephone or smartphone including one or more cameras) can capture a sequence of frames of a scene. In another example, an Internet protocol camera (IP camera) is a type of digital video camera that can be employed for surveillance or other applications. Unlike analog closed circuit television (CCTV) cameras, an IP camera can send and receive data via a computer network and the Internet. The image and/or video data can be captured and processed by such devices and systems (e.g., mobile devices, IP cameras, etc.) and can be output for consumption (e.g., displayed on the device and/or other device). In some cases, the image and/or video data can be captured by such devices and systems and output for processing and/or consumption by other devices. US 2014/0240553 discloses various methods for automatically adjusting a zoom feature in accordance with a camera movement to perform a dolly zoom effect. US 2021/0051273 discloses a photographic control method applied to a photographing apparatus including acquiring an image output by a photographing device of the photographing apparatus; detecting a photographing mode of the photographing apparatus; in response to the photographing mode of the photographing apparatus being a self-portrait mode, detecting at least on target object in the image, and determining position information of the at least one target object in the image; and according to the position information of the at least one target object in the image, controlling a photographing parameter of the photographing device to shoot the at least one target object in a preset setting. US 2017/0272661 discloses a zooming control apparatus comprising an object detection unit configured to detect an object from an image; a first acquisition unit configured to acquire information regarding a distance to the object; and a zooming control unit configured to perform zooming control for automatically changing a zoom magnification. US 2009/251594 A1 discusses a desired cropping window that may be determined using a coarse-to-fine searching strategy. Video frames may be cropped with a window that matches an aspect ratio of the target display, and resized isotropically to match a size of the target display. SUMMARY The invention is defined in the appended independent claims, to which attention is directed. Optional features are set out in the dependent claims. BRIEF DESCRIPTION OF THE DRAWINGS Illustrative embodiments of the present application are described in detail below with reference to the following figures: FIG. 1 is a block diagram illustrating an example architecture of an image capture and processing system, in accordance with some examples;FIG. 2 is a block diagram illustrating an example of a system including a video source and a video analytics system, in accordance with some examples;FIG. 3 is an example of a video analytics system processing video frames, in accordance with some examples;FIG. 4 is a block diagram illustrating an example of a blob detection system, in accordance with some examples;FIG. 5 is a block diagram illustrating an example of an object tracking system, in accordance with some examples;FIG. 6A is another diagram illustrating an example of machine learning based object detection and tracking system, in accordance with some examples;FIG. 6B is a diagram illustrating an example of an upsample component of a machine learning based object detection and tracking system, in accordance with some examples;FIG. 6C is a diagram illustrating an example of a backbone architecture for a machine learning based tracking system, in accordance with some examples;FIG. 7 is a diagram illustrating an example of machine learning based object classification system, in accordance with some examples;FIG. 8A is a diagram illustrating an example of a system including a frame cropping and scaling system, in accordance with some examples;FIG. 8B is a diagram illustrating an example of the frame cropping and scaling system, in accordance with some examples;FIG. 8C is a diagram illustrating an example of a frame cropping and scaling process, in accordance with some examples;FIG. 9 is a flow diagram illustrating another example of a frame cropping and scaling process, in accordance with some examples;FIG. 10A is a diagram illustrating an example of an initial frame of a video, in accordance with some examples;FIG. 10B is a diagram illustrating an example of a subsequent frame of