CN-122002023-A - Video encoding method, apparatus, device and storage medium

CN122002023ACN 122002023 ACN122002023 ACN 122002023ACN-122002023-A

Abstract

The disclosure provides a video coding method, a device, equipment and a storage medium, and relates to the technical field of computers, in particular to the fields of artificial intelligence, computer vision, video coding and the like. The method comprises the steps of reading a current frame image and a reference frame image to be encoded, respectively extracting feature points and descriptors from brightness components of the current frame image and the reference frame image to obtain first type feature points and descriptors corresponding to the current frame image and second type feature points and descriptors corresponding to the reference frame image, performing feature point matching according to the first type feature points and the descriptors and the second type feature points and descriptors to obtain coordinate differences of a plurality of feature matching points on the current frame image and the reference frame image and serve as motion vectors, and performing video encoding according to an encoding mode determined by direction consistency of the motion vectors.

Inventors

LIU FEIYANG

Assignees

北京百度网讯科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260129

Claims (12)

1. A video encoding method, comprising: Reading a current frame image and a reference frame image to be coded; Extracting feature points and descriptors from the brightness components of the current frame image and the reference frame image respectively to obtain first-class feature points and descriptors corresponding to the current frame image and second-class feature points and descriptors corresponding to the reference frame image; performing feature point matching according to the first type feature points and descriptors and the second type feature points and descriptors to obtain coordinate differences of a plurality of feature matching points on the current frame image and the reference frame image and using the coordinate differences as motion vectors; And carrying out video coding according to the coding mode determined by the direction consistency of the motion vector.
2. The method of claim 1, wherein the performing feature point matching according to the first type feature point and the descriptor and the second type feature point and the descriptor to obtain coordinate differences of a plurality of feature matching points on the current frame image and the reference frame image and using the coordinate differences as motion vectors includes: Performing feature point matching according to the first type feature points and the descriptors and the second type feature points and the descriptors to obtain current frame feature points in the current frame image and reference frame feature points in the reference frame image, wherein the current frame feature points and the reference frame feature points with corresponding relations form the feature matching points; and obtaining the motion vector according to the coordinate difference between the current frame characteristic point and the reference frame characteristic point with the corresponding relation.
3. The method according to claim 1 or 2, wherein the coding mode determined from the directional consistency of the motion vectors is video coded, comprising: dividing the coding unit into a plurality of coding blocks according to the condition that the coding unit carries out video coding on the current frame image and the reference frame image; Counting the motion vectors obtained by each coding block of the plurality of coding blocks based on the plurality of feature matching points; According to the direction consistency of the motion vector obtained by each coding block based on the feature matching points, determining a candidate motion model used by each coding block; pre-judging comprehensive motion estimation aiming at the plurality of coding blocks according to the candidate motion model used by each coding block to obtain a coding pre-decision result; And carrying out video coding according to the coding mode obtained by the coding pre-decision result.
4. A method according to claim 3, wherein said determining a candidate motion model for use by each of said coded blocks based on said directional consistency of said motion vectors derived by said each coded block based on said plurality of feature matching points comprises: Matching the candidate motion model according to the direction consistency of the motion vector obtained by each coding block based on the feature matching points; Under the condition that the motion vectors point to the same direction, the candidate motion model is a translational motion model; And under the condition that the motion vectors point to different directions, the candidate motion model is an affine motion model, wherein the affine motion model comprises at least one of a 4-parameter affine motion model and a 6-parameter affine motion model.
5. The method of claim 4, wherein said matching the candidate motion model according to the directional consistency of the motion vector obtained by each of the encoding blocks based on the plurality of feature matching points comprises: And calculating cosine similarity between a predicted motion vector obtained by matching the candidate motion model by the plurality of feature matching points and an actual motion vector obtained by matching the plurality of feature matching points, and measuring the directional consistency of the motion vector to obtain a directional consistency measurement result for matching the candidate motion model.
6. The method of claim 4, wherein said matching the candidate motion model according to the directional consistency of the motion vector obtained by each of the encoding blocks based on the plurality of feature matching points comprises: Cosine similarity calculation is carried out between the predicted motion vector obtained by matching the candidate motion model according to the plurality of feature matching points and the actual motion vector obtained by matching the plurality of feature matching points, so that a similarity calculation result is obtained; And averaging the similarity calculation results to measure the direction consistency of the motion vectors so as to obtain a direction consistency measurement result for matching the candidate motion model.
7. The method according to any one of claims 3-6, wherein the pre-judging the comprehensive motion estimation for the plurality of coding blocks according to the candidate motion model used by each coding block, to obtain a coding pre-decision result, comprises: The candidate motion models of the plurality of coding blocks, which are optimized under the comprehensive motion trend, are pre-judged according to the candidate motion model used by each coding block; The preferred candidate motion model is used as a target motion model used by the coding unit, wherein the target motion model is a global model of the coding unit; and directly adopting the global model of the coding unit to carry out video coding on the coding mode to serve as the coding pre-decision result.
8. The method according to any of claims 4-6, wherein the video encoding of the encoding mode according to the encoding pre-decision result comprises: In the case that the encoding pre-decision result is that the encoding mode directly adopts the global model of the encoding unit to perform video encoding, When the global model of the coding unit is a translational motion model, the translational motion model is directly adopted, and affine motion models in candidate motion models are skipped; when the global model of the coding unit is an affine motion model, the affine motion model is directly adopted, and a translational motion model in the candidate motion model is skipped.
9. A video encoding apparatus, comprising: the reading module is used for reading the current frame image and the reference frame image to be coded; the feature extraction module is used for respectively extracting feature points and descriptors from the brightness components of the current frame image and the reference frame image to obtain first-class feature points and descriptors corresponding to the current frame image and second-class feature points and descriptors corresponding to the reference frame image; The feature matching module is used for matching the feature points according to the first type of feature points and the descriptors and the second type of feature points and the descriptors to obtain coordinate differences of a plurality of feature matching points on the current frame image and the reference frame image and using the coordinate differences as motion vectors; And the video coding module is used for carrying out video coding according to the coding mode determined by the direction consistency of the motion vector.
10. An electronic device, comprising: At least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
11. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-8.
12. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-8.

Description

Video encoding method, apparatus, device and storage medium Technical Field The present disclosure relates to the field of computer technology, and in particular, to the fields of artificial intelligence, computer vision, video coding, and the like. Background Traditional video coding standards mine temporal redundancy information of video based on translational motion models, but have limited expressive power. With the popularity of high resolution video, the deficiencies of translational motion models over complex motions (e.g., rotation, scaling) are highlighted, and affine motion models are introduced in the encoding mode to handle more complex motions. Determining a coding mode to perform video coding according to what strategy is a technical problem to be solved. Disclosure of Invention The present disclosure provides a video encoding method, apparatus, device, and storage medium. According to an aspect of the present disclosure, there is provided a video encoding method including: Reading a current frame image and a reference frame image to be coded; Extracting feature points and descriptors from the brightness components of the current frame image and the reference frame image respectively to obtain first-class feature points and descriptors corresponding to the current frame image and second-class feature points and descriptors corresponding to the reference frame image; performing feature point matching according to the first type feature points and descriptors and the second type feature points and descriptors to obtain coordinate differences of a plurality of feature matching points on the current frame image and the reference frame image and using the coordinate differences as motion vectors; And carrying out video coding according to the coding mode determined by the direction consistency of the motion vector. According to another aspect of the present disclosure, there is provided a video encoding apparatus including: the reading module is used for reading the current frame image and the reference frame image to be coded; the feature extraction module is used for respectively extracting feature points and descriptors from the brightness components of the current frame image and the reference frame image to obtain first-class feature points and descriptors corresponding to the current frame image and second-class feature points and descriptors corresponding to the reference frame image; The feature matching module is used for matching the feature points according to the first type of feature points and the descriptors and the second type of feature points and the descriptors to obtain coordinate differences of a plurality of feature matching points on the current frame image and the reference frame image and using the coordinate differences as motion vectors; And the video coding module is used for carrying out video coding according to the coding mode determined by the direction consistency of the motion vector. According to another aspect of the present disclosure, there is provided an electronic device including: At least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods provided by any one of the embodiments of the present disclosure. According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform a method provided according to any one of the embodiments of the present disclosure. According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method provided according to any of the embodiments of the present disclosure. It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification. Drawings The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein: FIG. 1 is a schematic diagram of a distributed cluster processing scenario according to an embodiment of the present disclosure; fig. 2 is a flow chart of a video encoding method according to an embodiment of the present disclosure; FIG. 3 is a flow chart of another video encoding method according to an embodiment of the present disclosure; fig. 4 is a flowchart illustrating a video encoding method to which an embodiment of the present disclosure is applied; FIG. 5 is a flow chart of another video encoding method to which embodiments of the present disclosure are applied; fig. 6 is a schematic diagram of a co