EP-4742198-A2 - OBJECT ANALYSIS

EP4742198A2EP 4742198 A2EP4742198 A2EP 4742198A2EP-4742198-A2

Abstract

A method comprising performing object detection within a set of representations of a hierarchically-structured signal, the set of representations comprising at least a first representation of the signal at a first level of quality and a second representation of the signal at a second, higher level of quality.

Inventors

MEARDI, Guido
COBIANCHI, Guendalina
KESZTHELYI, Balázs
MAKEEV, Ivan
FERRARA, Simone
POULARAKIS, Stergios

Assignees

V-Nova International Limited

Dates

Publication Date: 20260513
Application Date: 20200211

Claims (15)

A method of performing object detection within a set of representations of a hierarchically-structured signal, wherein the hierarchically-structured signal is structured in accordance with a tiered hierarchy of representations of an original signal and wherein each of the representations, and each tier of the hierarchy, is associated with a respective level of quality, wherein the level of quality of each representation is associated with one or more of: a resolution of the representation, a number of pixels within the representation, a bit depth of the representation, and whether the representation is progressive or interlaced, the set of representations comprising at least a first representation of the original signal at a first level of quality and a second representation of the original signal at a second, higher, level of quality, the method comprising: performing a first object detection on the first representation to find one or more instances of one or more objects of one or more particular classes; in response to determining that a first level of confidence associated with the first object recognition performed within the first representation does not meet a first object-recognition threshold level of confidence; performing a second object detection on the second representation to find the one or more instances of the one or more objects of the one or more particular classes and localizing the one or more objects within the one or more representations; determining whether a second level of confidence associated with the second object recognition performed within the second representation meets a second object-recognition threshold level of confidence; based on whether the second level of confidence meets the second object-recognition threshold level of confidence, outputting that the one or more objects have been detected and localized; wherein the first threshold level of confidence is associated with the first level of quality and the second threshold level of confidence is associated with the second level of quality; and wherein the second threshold level of confidence is associated with the first threshold level of confidence.
A method according to claim 1, wherein said object detection is performed using at least one convolutional neural network, CNN, wherein said object detection is performed using a first CNN associated with the first level of quality and a second CNN associated with the second level of quality.
A method according to claim 2, wherein data output by the first CNN is provided to the second CNN.
A method according to any of claims 1 to 3, comprising performing object recognition within one or more of the set of representations.
A method according to any of claims 1 to 4, comprising obtaining the first representation by decoding a tier of the hierarchically-structured signal, wherein the hierarchically-structured signal is encoded in a hierarchically-encoded signal received from an encoder.
A method according to any of claims 1 to 5, comprising obtaining only part of the second representation using only part of the first representation.
A method according to claim 6, wherein object detection and/or object recognition is performed within the part of the second representation.
A method according to claim 6 or 7, wherein the part of the first representation corresponds to a region of interest within the first representation.
A method according to any of claims 1 to 8, wherein the signal comprises a video signal.
A method according to any of claims 1 to 9, wherein the first and second representations are each of the same time sample of the video signal.
A method according to any of claims 1 to 10, wherein the level of quality corresponds to an image resolution.
A method according to any of claims 1 to 11, wherein: the set of representations comprises a third representation of the original signal at a third level of quality higher than the level of quality of the second representation; and the method comprises: based on the second level of confidence meeting the second object-recognition threshold level of confidence, performing a third object detection on the third representation to find the one or more instances of the one or more objects of the one or more particular classes and localizing the one or more objects within the one or more representations; determining whether a third level of confidence associated with the third object recognition performed within the third representation meets a third object-recognition threshold level of confidence; and based on the third level of confidence meets the third object-recognition threshold level of confidence, outputting that the one or more objects have been detected and localized; wherein the third object-recognition threshold level of confidence is a function of the first object-recognition threshold level of confidence and the second object-recognition threshold level of confidence.
The method of claim 12, comprising, based on the third level of confidence not meetings the third object-recognition threshold level of confidence, and based on the number of representations within the hierarchy below the third representation being larger than a threshold number of representations, abandoning an object detection process and outputting that the one or more objects have not been detected and localized.
Apparatus configured to perform a method according to any of claims 1 to 13.
A computer program arranged, when executed, to perform a method according to any of claims 1 to 13.

Description

Technical Field The present disclosure relates to object analysis. Background Object analysis may be performed within an image and/or a video. Examples of types of object analysis include, but are not limited to, object detection and object recognition. Object analysis may be performed within images and video of varied resolutions and compression levels, for example images in an uncompressed file format or in a compressed file format. Examples of uncompressed file formats are the BMP and TGA file formats. An example of a compressed image file format is JPEG. An example of a compressed video file format is H.264/MPEG-4. Summary According to first embodiments, there is provided a method comprising performing object detection within a set of representations of a hierarchically-structured signal, the set of representations comprising at least a first representation of the signal at a first level of quality and a second representation of the signal at a second, higher level of quality. According to second embodiments, there is provided a method comprising performing object analysis using at least part of a representation of a signal at a first level of quality, the representation of the signal at the first level of quality having been generated using a representation of the signal at a second, higher level of quality, wherein performing object analysis comprises performing object detection and/or object recognition. According to third embodiments, there is provided a method comprising performing object analysis within an image in a multi-resolution image format in which multiple versions of an image are available at different respective image resolutions. According to fourth embodiments, there is provided a method comprising partially decoding a representation of a signal in response to object analysis performed within the representation detecting an object in a region of interest within the representation of the signal, wherein the partial decoding is performed in relation to the region of interest. According to fifth embodiments, there is provided apparatus configured to perform a method according to any of the first through fourth embodiments. According to sixth embodiments, there is provided a computer program arranged, when executed, to perform a method according to any of the first through fourth embodiments. Further features and advantages will become apparent from the following description, given by way of example only, which is made with reference to the accompanying drawings. Brief Description of the Drawings Figure 1 shows a block diagram of an example of a hierarchical system in accordance with embodiments;Figure 2 shows a block diagram of another example of a hierarchical system in accordance with embodiments:Figure 3 shows a block diagram of another example of a hierarchical system in accordance with embodiments:Figure 4 shows a block diagram of an example of a part of another example of a hierarchical system in accordance with embodiments; andFigure 5 shows a block diagram of an example of an apparatus in accordance with embodiments. Detailed Description Referring to Figure 1, there is shown an example of a system 100. The system 100 may comprise a distributed system. The system 100 may be in a self-driving vehicle (also referred to as an "autonomous vehicle"). The system 100 may be used to provide computer vision functionality in relation to the self-driving vehicle. In this example, the system 100 comprises a first device 110. In this example, the first device 110 comprises an encoder 110. In this example, the encoder 110 generates encoded data. In this example, the encoder 110 receives data and encodes the received data to generate the encoded data based on the received data. In this example, the system 100 comprises a second device 120. In this example, the second device 120 comprises a decoder 120. In this example, the decoder 120 generates decoded data. In this example, the decoder 120 receives the encoded data from the encoder 110 and decodes the encoded data to generate the decoded data. In this example, the second device 120 obtains data by receiving the data from the first device 110. In some examples, the second device 120 obtains data in another manner. For example, the second device 120 may retrieve data from memory, as will be described in more detail below with reference to Figure 2. The first device 110 and the second device 120 may be embodied in hardware and/or software. The first device 110 and the second device 120 may have a client-server relationship. For example, the first device 110 may have a server role and the second device 120 may have a client role. In this example, the first device 110 is communicatively coupled to the second device 120. In this example, the first device 110 is directly communicatively coupled to the second device 120. A communication protocol may be defined for communications between the first device 110 and the second device 120. In some examples, t