US-20260127746-A1 - IMAGE MOTION ESTIMATION METHOD AND RELATED APPARATUS

US20260127746A1US 20260127746 A1US20260127746 A1US 20260127746A1US-20260127746-A1

Abstract

An image motion estimation method includes obtaining motion vectors of an object in any two neighboring image frames in a video, obtaining a plurality of encoding blocks corresponding to the motion vectors of the object in the two neighboring image frames, dividing the encoding blocks to obtain a first prediction unit, performing motion search on pixels of the first prediction unit to obtain a first difference parameter set corresponding to the motion vectors of the first prediction unit, dividing the encoding blocks to obtain at least two second prediction units, performing motion search on pixels in the second prediction units for the encoding block based on the first difference parameter set to obtain a second difference parameter set of motion vectors of the second prediction units, and determining a target prediction unit based on the first difference parameter set and the second difference parameter set of each encoding block.

Inventors

Wei Cao

Assignees

SMARTER SILICON (SHANGHAI) TECHNOLOGIES CO., LTD.

Dates

Publication Date: 20260507
Application Date: 20251027
Priority Date: 20241105

Claims (20)

1 . An image motion estimation method comprising: obtaining motion vectors of an object in two neighboring image frames in a video; obtaining a plurality of encoding blocks corresponding to the motion vectors of the object in the two neighboring image frames; dividing the encoding blocks to obtain a first prediction unit, and performing motion search on pixels of the first prediction unit for the encoding blocks based on the motion vectors of the object in the two neighboring image frames to obtain a first difference parameter set corresponding to the motion vectors of the first prediction unit, each first difference parameter corresponding to a motion vector difference of the two neighboring image frames in a pixel of the first prediction unit, and the first prediction unit being obtained by dividing the encoding blocks in a first method; dividing the encoding blocks to obtain at least two second prediction units, and performing motion search on pixels in the second prediction units for the encoding block based on the first difference parameter set to obtain a second difference parameter set of motion vectors of the second prediction units, each second motion vector corresponding to a motion vector difference of the two neighboring image frames in a pixel of the second prediction units, the second prediction units being obtained by dividing the encoding blocks in a second method, and the second method being different from the first method; storing a first difference parameter set and a second difference parameter set of each encoding block; and determining a target prediction unit based on the first difference parameter set and the second difference parameter set of each encoding block, the target prediction unit being a prediction unit in a first image matching motion of a second image, the first image being neighboring the second image and being generated after the second image, and a difference parameter of the target prediction unit being smaller than a difference parameter of a non-target prediction unit.
2 . The image motion estimation method according to claim 1 , wherein performing the motion search on the pixels of the first prediction unit for the encoding blocks based on the motion vectors of the object in the two neighboring image frames to obtain the first difference parameter set corresponding to the motion vectors of the first prediction unit includes: based on the motion vectors of the object in the two neighboring image frames, performing the search in the first prediction unit with at least two search precisions to obtain difference parameters of the pixels of the first prediction unit under the corresponding at least two search precisions; based on the difference parameters of the pixels of the first prediction unit under the corresponding at least two search precisions, obtaining a target search precision pixel of a corresponding precision, a difference parameter of the target search precision pixel in the first prediction unit being smaller than a difference parameter of a non-target search precision pixel; and forming the first difference parameter set using the difference parameters of the pixels of the first prediction under the at least two search precisions; wherein: a first target search precision pixel is determined based on a first search precision; a second target search precision pixel is determined based on a second search precision; and the second search precision is neighboring to and greater than the first search precision.
3 . The image motion estimation method according to claim 1 , wherein dividing the encoding blocks to obtain the at least two second prediction units, and performing the motion search on pixels in the second prediction units for the encoding blocks based on the first difference parameter set to obtain the second difference parameter set of the motion vectors of the second prediction units includes: based on positions of the pixels in the second prediction unit, determining pixels at corresponding positions in the first prediction unit; and accumulating target difference parameters of the pixels at the corresponding positions in the first prediction unit to obtain the second difference parameter set of the motion vectors in the second prediction unit, the target difference parameters being difference parameters obtained by performing search under a search precision corresponding to the second method.
4 . The image motion estimation method according to claim 1 , wherein a number of pixels contained in the first prediction unit is greater than a number of pixels contained in any one second prediction unit of the at least two second prediction units.
5 . The image motion estimation method according to claim 1 , further comprising, before storing the first difference parameter set and the second difference parameter set of each coding block: based on a second difference parameter set of each encoding block, determining a target second prediction unit among the second prediction units of the coding blocks, a difference parameter of a motion vector in the target second prediction unit being smaller than a difference parameter of a motion vector in a non-target second prediction unit of the encoding blocks; and using the motion vector of the target second prediction unit as the motion vector of the object in the two neighboring frames of images.
6 . The image motion estimation method according to claim 1 , wherein storing the first difference parameter set and the second difference parameter set of each encoding block includes: determining encoding block division methods corresponding to the first difference parameter set and the second difference parameter set; and based on the corresponding encoding block division methods, storing the first difference parameter set and the second difference parameter set into storage spaces corresponding to the encoding block division methods, respectively.
7 . The image motion estimation method according to claim 6 , wherein based on the first difference parameter set and the second difference parameter set of each encoding block, determining the target prediction unit includes: for the first prediction unit and the second prediction unit of each encoding block, obtaining a difference parameter set for each prediction unit of the first prediction unit and the second prediction unit, respectively; according to the difference parameter set of each type of the first prediction unit and the second prediction unit, determining a third prediction unit in a corresponding encoding block, a difference parameter of the third prediction unit being smaller than a difference parameter of a non-third prediction unit in the corresponding encoding block; and according to third prediction units in the encoding blocks, determining the target prediction unit, the difference parameter of the target prediction unit being smaller than the difference parameter of the non-target prediction unit.
8 . An electronic device comprising: one or more processors; and one or more memories storing a computer program that, when executed by the one or more processors, causes the one or more processors to: obtain motion vectors of an object in any two neighboring image frames in a video; obtain a plurality of encoding blocks corresponding to the motion vectors of the object in the two neighboring image frames; divide the encoding blocks to obtain a first prediction unit, and perform motion search on pixels of the first prediction unit for the encoding blocks based on the motion vectors of the object in the two neighboring image frames to obtain a first difference parameter set corresponding to the motion vectors of the first prediction unit, each first difference parameter corresponding to a motion vector difference of the two neighboring image frames in a pixel of the first prediction unit, and the first prediction unit being obtained by dividing the encoding blocks in a first method; divide the encoding blocks to obtain at least two second prediction units, and perform motion search on pixels in the second prediction units for the encoding block based on the first difference parameter set to obtain a second difference parameter set of motion vectors of the second prediction units, each second motion vector corresponding to a motion vector difference of the two neighboring image frames in a pixel of the second prediction units, the second prediction units being obtained by dividing the encoding blocks in a second method, and the second method being different from the first method; store a first difference parameter set and a second difference parameter set of each encoding block; and determine a target prediction unit based on the first difference parameter set and the second difference parameter set of each encoding block, the target prediction unit being a prediction unit in a first image matching motion of a second image, the first image being neighboring the second image and being generated after the second image, and a difference parameter of the target prediction unit being smaller than a difference parameter of a non-target prediction unit.
9 . The device according to claim 8 , wherein the one or more processors are further configured to: based on the motion vectors of the object in the two neighboring image frames, perform the search in the first prediction unit with at least two search precisions to obtain difference parameters of the pixels of the first prediction unit under the corresponding at least two search precisions; based on the difference parameters of the pixels of the first prediction unit under the corresponding at least two search precisions, obtain a target search precision pixel of a corresponding precision, a difference parameter of the target search precision pixel in the first prediction unit being smaller than a difference parameter of a non-target search precision pixel; and form the first difference parameter set using the difference parameters of the pixels of the first prediction under the at least two search precisions; wherein: a first target search precision pixel is determined based on a first search precision; a second target search precision pixel is determined based on a second search precision; and the second search precision is neighboring to and greater than the first search precision.
10 . The device according to claim 8 , wherein the one or more processors are further configured to: based on positions of the pixels in the second prediction unit, determine pixels at corresponding positions in the first prediction unit; and accumulate target difference parameters of the pixels at the corresponding positions in the first prediction unit to obtain the second difference parameter set of the motion vectors in the second prediction unit, the target difference parameters being difference parameters obtained by performing search under a search precision corresponding to the second method.
11 . The device according to claim 8 , wherein a number of pixels contained in the first prediction unit is greater than a number of pixels contained in any one second prediction unit of the at least two second prediction units.
12 . The device according to claim 8 , wherein the one or more processors are further configured to: based on a second difference parameter set of each encoding block, determine a target second prediction unit among the second prediction units of the coding blocks, a difference parameter of a motion vector in the target second prediction unit being smaller than a difference parameter of a motion vector in a non-target second prediction unit of the encoding blocks; and use the motion vector of the target second prediction unit as the motion vector of the object in the two neighboring frames of images.
13 . The device according to claim 8 , wherein the one or more processors are further configured to: determine encoding block division methods corresponding to the first difference parameter set and the second difference parameter set; and based on the corresponding encoding block division methods, store the first difference parameter set and the second difference parameter set into storage spaces corresponding to the encoding block division methods, respectively.
14 . The device according to claim 13 , wherein the one or more processors are further configured to: for the first prediction unit and the second prediction unit of each encoding block, obtain a difference parameter set for each prediction unit of the first prediction unit and the second prediction unit, respectively; according to the difference parameter set of each type of the first prediction unit and the second prediction unit, determine a third prediction unit in a corresponding encoding block, a difference parameter of the third prediction unit being smaller than a difference parameter of a non-third prediction unit in the corresponding encoding block; and according to third prediction units in the encoding blocks, determine the target prediction unit, the difference parameter of the target prediction unit being smaller than the difference parameter of the non-target prediction unit.
15 . A computer-readable storage medium storing a computer program that, when executed by one or more processors, causes the one or more processors to: obtain motion vectors of an object in any two neighboring image frames in a video; obtain a plurality of encoding blocks corresponding to the motion vectors of the object in the two neighboring image frames; divide the encoding blocks to obtain a first prediction unit, and perform motion search on pixels of the first prediction unit for the encoding blocks based on the motion vectors of the object in the two neighboring image frames to obtain a first difference parameter set corresponding to the motion vectors of the first prediction unit, each first difference parameter corresponding to a motion vector difference of the two neighboring image frames in a pixel of the first prediction unit, and the first prediction unit being obtained by dividing the encoding blocks in a first method; divide the encoding blocks to obtain at least two second prediction units, and perform motion search on pixels in the second prediction units for the encoding block based on the first difference parameter set to obtain a second difference parameter set of motion vectors of the second prediction units, each second motion vector corresponding to a motion vector difference of the two neighboring image frames in a pixel of the second prediction units, the second prediction units being obtained by dividing the encoding blocks in a second method, and the second method being different from the first method; store a first difference parameter set and a second difference parameter set of each encoding block; and determine a target prediction unit based on the first difference parameter set and the second difference parameter set of each encoding block, the target prediction unit being a prediction unit in a first image matching motion of a second image, the first image being neighboring the second image and being generated after the second image, and a difference parameter of the target prediction unit being smaller than a difference parameter of a non-target prediction unit.
16 . The computer-readable storage medium according to claim 15 , wherein the one or more processors are further configured to: based on the motion vectors of the object in the two neighboring image frames, perform the search in the first prediction unit with at least two search precisions to obtain difference parameters of the pixels of the first prediction unit under the corresponding at least two search precisions; based on the difference parameters of the pixels of the first prediction unit under the corresponding at least two search precisions, obtain a target search precision pixel of a corresponding precision, a difference parameter of the target search precision pixel in the first prediction unit being smaller than a difference parameter of a non-target search precision pixel; and form the first difference parameter set using the difference parameters of the pixels of the first prediction under the at least two search precisions; wherein: a first target search precision pixel is determined based on a first search precision; a second target search precision pixel is determined based on a second search precision; and the second search precision is neighboring to and greater than the first search precision.
17 . The computer-readable storage medium according to claim 15 , wherein the one or more processors are further configured to: based on positions of the pixels in the second prediction unit, determine pixels at corresponding positions in the first prediction unit; and accumulate target difference parameters of the pixels at the corresponding positions in the first prediction unit to obtain the second difference parameter set of the motion vectors in the second prediction unit, the target difference parameters being difference parameters obtained by performing search under a search precision corresponding to the second method.
18 . The computer-readable storage medium according to claim 15 , wherein a number of pixels contained in the first prediction unit is greater than a number of pixels contained in any one second prediction unit of the at least two second prediction units.
19 . The computer-readable storage medium according to claim 15 , wherein the one or more processors are further configured to: based on a second difference parameter set of each encoding block, determine a target second prediction unit among the second prediction units of the coding blocks, a difference parameter of a motion vector in the target second prediction unit being better than or smaller than a difference parameter of a motion vector in a non-target second prediction unit of the encoding blocks; and use the motion vector of the target second prediction unit as the motion vector of the object in the two neighboring frames of images.
20 . The computer-readable storage medium according to claim 15 , wherein the one or more processors are further configured to: determine encoding block division methods corresponding to the first difference parameter set and the second difference parameter set; and based on the corresponding encoding block division methods, store the first difference parameter set and the second difference parameter set into storage spaces corresponding to the encoding block division methods, respectively.

Description

CROSS-REFERENCES TO RELATED APPLICATION This application claims priority to Chinese Patent Application No. 202411571133.4, filed on November 5, 2024, the entire content of which is incorporated herein by reference. FIELD OF TECHNOLOGY The present disclosure relates to the information processing field and, more particularly, to an image movement estimation method and a related apparatus. BACKGROUND Video compression can eliminate redundant video information in a time domain through inter prediction to improve compression efficiency. A current image macro block can be predicted by searching for a motion vector (i.e., MV) of an object in an image. The most suitable prediction unit can be found by traversing and partitioning the image macro block. The optimal block can be matched through a precise subpixel search to reduce residuals to save the bit-rate to be transmitted. In some solutions, the image macro block is traversed and partitioned into four types of prediction units. FIG. 1 is a schematic diagram of the prediction units by dividing the image macro block. The prediction units can include 2N×2N, 2N×N, N×2N, and N×N, where N represents N pixels, and N is an integer. For each motion vector in each prediction unit, a search is performed on the optimal subpixel MV (refined MV) based on a minimum SATD (Sum of Absolute Transformed Difference) cost (loss) principle. Then, for all motion vectors of each prediction unit, the top N subpixel MVs are selected based on the minimal SATD cost principle. From all prediction units, the top M prediction units are selected based on the minimal SATD cost principle. Finally, the top M optimal prediction units and the corresponding subpixel MVs are used as an output of the motion estimation. However, to meet the encoding performance requirements, the processing time for each macro block (16×16) is less than 200 cycles. The searches for 2N×2N, 2N×N, and N×2N require processing the SATD calculation of nine points. The hardware resources are limited and become a performance bottleneck. Therefore, each prediction unit is usually restricted to processing at most two subpixel searches for MVs. However, the subpixel points of the current (PU, MV) may all be one of the optimal MVs. Due to the limitation on the number of MVs, the optimal MV may be missed, resulting in a loss in search precision to affect the compression efficiency. SUMMARY One aspect of this disclosure provides an image motion estimation method. The method includes obtaining motion vectors of an object in any two neighboring image frames in a video, obtaining a plurality of encoding blocks corresponding to the motion vectors of the object in the two neighboring image frames, dividing the encoding blocks to obtain a first prediction unit, performing motion search on pixels of the first prediction unit for the encoding blocks based on the motion vectors of the object in the two neighboring image frames to obtain a first difference parameter set corresponding to the motion vectors of the first prediction unit, dividing the encoding blocks to obtain at least two second prediction units, performing motion search on pixels in the second prediction units for the encoding block based on the first difference parameter set to obtain a second difference parameter set of motion vectors of the second prediction units, storing a first difference parameter set and a second difference parameter set of each encoding block, and determining a target prediction unit based on the first difference parameter set and the second difference parameter set of each encoding block. Each first difference parameter corresponds to a motion vector difference of the two neighboring image frames in a pixel of the first prediction unit. The first prediction unit is obtained by dividing the encoding blocks in a first method. Each second motion vector corresponds to a motion vector difference of the two neighboring image frames in a pixel of the second prediction units. The second prediction units are obtained by dividing the encoding blocks in a second method. The second method is different from the first method. The target prediction unit is a prediction unit in a first image that matches motion of a second image. The first image is neighboring the second image and is generated later than the second image. A difference parameter of the target prediction unit is better than or smaller than a difference parameter of a non-target prediction unit. Another aspect of this disclosure provides an electronic device, including one or more processors and one or more memories. The one or more memories store a computer program that, when executed by the one or more processors, causes the one or more processors to obtain motion vectors of an object in any two neighboring image frames in a video, obtain a plurality of encoding blocks corresponding to the motion vectors of the object in the two neighboring image frames, divide the encoding blocks to obtain a first pred