CN-122002049-A - Complexity-aware end-to-end coding and decoding complexity control method and system

CN122002049ACN 122002049 ACN122002049 ACN 122002049ACN-122002049-A

Abstract

The invention provides an end-to-end encoding and decoding complexity control method and system for complexity perception, wherein the method enhances the flexible adjustment capability of model complexity by determining that each module complexity candidate set represents the configuration of selectable channel number; and finally, a multi-stage network model training scheme is adopted, and the difference between the target complexity and the current complexity is brought into the model training constraint in the design process of the loss function to realize the effective training of the network. The invention effectively solves the complexity control of end-to-end depth video coding and realizes the self-adaptive coding and decoding of application requirements of different complexity.

Inventors

Lin Jielian
WEI XIAOJIE
He nian
XU YIWEN
ZHAO TIESONG

Assignees

莆田学院

Dates

Publication Date: 20260508
Application Date: 20260211

Claims (10)

1. The end-to-end encoding and decoding complexity control method for complexity perception is characterized by comprising the following steps of: s1, designing complexity candidate sets of each depth coding module in an end-to-end coding and decoding model; S2, designing a complexity perception module to establish dynamic configuration mapping relation of each module by combining input characteristics through complexity constraint conditions; and step S3, designing a multi-stage model training scheme on the basis of the steps S1 and S2, and realizing complexity-aware adaptive end-to-end video coding by utilizing the trained model and combining complexity constraint conditions.
2. The method for end-to-end codec complexity control of complexity aware according to claim 1, wherein said step S1 is specifically implemented as follows: S11, selecting RDVC end-to-end coding and decoding models as basic frames, and replacing a convolution network of a coding end with scalable convolution to enable each depth coding module to have basic conditions of complexity selection; step S12, predefining a complexity candidate list of each depth coding module, namely a scalable convolved channel number candidate set.
3. The method for end-to-end codec complexity control of complexity aware of claim 2, wherein the predefined complexity candidate list of each depth coding module is as follows: in the motion compression, the complexity candidate list of the main coder-decoder module is set as [32, 40, 48, 56, 64], the complexity candidate list of the super prior coder-decoder module is set as [32, 40, 48, 56, 64], the complexity candidate list of the entropy model parameter generating module is set as [96, 120, 144, 168, 192], and the complexity candidate list of the correction value generating module is set as [32, 40, 48, 56, 64]; In the residual compression, the complexity candidate list of the main codec module is [48, 56, 64, 72, 80, 88, 96], the complexity candidate list of the super prior codec module is [48, 56, 64, 72, 80, 88, 96], the complexity candidate list of the entropy model parameter generating module is [144, 168, 192, 216, 240, 264, 288], and the complexity candidate list of the correction value generating module is [48, 56, 64, 72, 80, 88, 96]; In the motion compensation, the complexity candidate list of the feature refinement module and the self-attention module is respectively [32, 40, 48, 56, 64] and [32, 40, 48, 56, 64]; Setting the complexity candidate list of the feature extraction module to be [32, 40, 48, 56, 64]; setting a complexity candidate list of a residual block fusion unit in the feature fusion module to be [32, 40, 48, 56, 64]; The complexity candidate list of the feature aggregation unit and the weight prediction summation unit in the frame reconstruction module is set to be [32, 40, 48, 56, 64].
4. The end-to-end codec complexity control method according to claim 1, wherein the step S2 specifically includes constructing a complexity sensing module CAConv for replacing a first scalable convolution component of all depth coding modules with a complexity selection function, where the complexity sensing module specifically performs the following operations: Step S21-feature to be processed Carrying out average pooling to generate a product with the size of Mapping features of (a) : Step S22, connecting Inputting a layer of subject global complexity level Controlled scalable convolution layer Wherein Generating dimension as Mapping features of (a) : Step S23, connecting Input to another layer of fixed output channels Scalable convolution of (2) Wherein Representing an alternative number of candidate complexities: step S24, pairing through Gumbel Softmax Processing to obtain final selection vector Complexity for dynamically controlling target convolution: And S25, setting the complexity of the target convolution layer according to the selection vector, and inputting the feature to be processed into the target convolution layer to complete reasoning.
5. The method for end-to-end codec complexity control of complexity aware of claim 1, wherein the multi-stage model training scheme is as follows: The first stage, adopting a progressive joint optimization strategy to finish the pre-training of the basic coding and decoding capacity of the model; the second stage, based on the pre-training model, adopts a random complexity allocation strategy to freeze the channel selection network of the complexity perception module CAConv in the model, and only trains the managed scalable convolution part, so that the complexity allocation process in the subsequent stage is not influenced; the third stage, introducing additional complexity constraint for training CAConv channel selection part, defining module complexity as TFLOPs when module reasoning, for each training step, using global complexity level CL to control each module complexity, and optimizing loss function in current stage The definition is as follows: Wherein, the Representing the number of P frames in a GOP, Is the first The weights of the P-frames, And Respectively represent the first The number of bits consumed for encoding motion information and residual information for P frames, Using MSE and MS-SSIM for computing original frames And reconstructing a frame Is a distortion condition of (2); Is a trade-off factor between control code rate and distortion; For a trade-off factor, trade-off the relationship between rate-distortion loss and complexity loss; And The current codec complexity and the target codec complexity, respectively.
6. The method for end-to-end codec complexity control of complexity aware of claim 5, wherein the weighting factor is a trade-off factor The specific expression of (2) is as follows: And Respectively represent global complexity levels Upper and lower limits of (2).
7. The method for end-to-end codec complexity control of complexity aware of claim 5, wherein the first step is Weights of P frames The specific expression of (2) is as follows: Weight distribution function More weight is assigned to frames with larger time steps to increase the loss duty cycle of subsequent frames.
8. The method for end-to-end codec complexity control of complexity aware of claim 5, wherein the target codec complexity is The specific expression of (2) is as follows: Representing the minimum complexity of the current codec, Representing the maximum complexity of the current codec.
9. The end-to-end codec complexity control method of claim 5, wherein the complexity allocation is performed only in a first frame of each GOP, and subsequent frames directly multiplex the complexity allocation results, thereby significantly reducing additional computational overhead.
10. A complexity aware end-to-end codec complexity control system comprising a processor, a memory and a computer program stored on said memory, said processor, when executing said computer program, performing in particular the steps of the complexity aware end-to-end codec complexity control method according to any one of claims 1-9.

Description

Complexity-aware end-to-end coding and decoding complexity control method and system Technical Field The invention belongs to the field of video coding, and particularly relates to an end-to-end coding and decoding complexity control method and system for complexity perception. Background With the vigorous development of ultra-high definition video and real-time communication, the video encoding and decoding process can effectively relieve the requirement of video data on bandwidth. In recent years, the compression efficiency of the depth video coding is further improved compared with that of the traditional hybrid coding framework under the research of experts in the industry from the DVC coding framework, but the problem of the complexity of the encoding and decoding is still a research problem that needs to be solved when the depth video codec is pushed to the application. The advent of current scalable convolutional networks provides a opportunity for complexity optimization of the end-to-end video coding framework. The existing research solves the complexity control of the decoding end through the set complexity. However, there are limitations to relying on only a single complexity parameter for regulation. In this context, how to combine the input features and the set complexity to perform the perceptual distribution on the complexity of each codec module is a more practical and effective and accurate solution. In view of this, the present invention proposes an end-to-end codec complexity control method and system for complexity awareness, so as to achieve adaptive codec with different complexity application requirements. Disclosure of Invention The invention aims to provide an end-to-end encoding and decoding complexity control method and system for complexity perception, which can effectively control the decoding complexity and improve the self-adaptive decoding capability of an end-to-end video encoder. In order to achieve the above purpose, the technical scheme of the invention is as follows: An end-to-end coding and decoding complexity control method for complexity perception comprises the following steps: s1, designing complexity candidate sets of each depth coding module in an end-to-end coding and decoding model; S2, designing a complexity perception module to establish dynamic configuration mapping relation of each module by combining input characteristics through complexity constraint conditions; and step S3, designing a multi-stage model training scheme on the basis of the steps S1 and S2, and realizing complexity-aware adaptive end-to-end video coding by utilizing the trained model and combining complexity constraint conditions. Preferably, the step S1 is specifically implemented as follows: S11, selecting RDVC end-to-end coding and decoding models as basic frames, and replacing a convolution network of a coding end with scalable convolution to enable each depth coding module to have basic conditions of complexity selection; step S12, predefining a complexity candidate list of each depth coding module, namely a scalable convolved channel number candidate set. Preferably, the complexity candidate list of each depth coding module is predefined, specifically as follows: in the motion compression, the complexity candidate list of the main coder-decoder module is set as [32, 40, 48, 56, 64], the complexity candidate list of the super prior coder-decoder module is set as [32, 40, 48, 56, 64], the complexity candidate list of the entropy model parameter generating module is set as [96, 120, 144, 168, 192], and the complexity candidate list of the correction value generating module is set as [32, 40, 48, 56, 64]; In the residual compression, the complexity candidate list of the main codec module is [48, 56, 64, 72, 80, 88, 96], the complexity candidate list of the super prior codec module is [48, 56, 64, 72, 80, 88, 96], the complexity candidate list of the entropy model parameter generating module is [144, 168, 192, 216, 240, 264, 288], and the complexity candidate list of the correction value generating module is [48, 56, 64, 72, 80, 88, 96]; In the motion compensation, the complexity candidate list of the feature refinement module and the self-attention module is respectively [32, 40, 48, 56, 64] and [32, 40, 48, 56, 64]; Setting the complexity candidate list of the feature extraction module to be [32, 40, 48, 56, 64]; setting a complexity candidate list of a residual block fusion unit in the feature fusion module to be [32, 40, 48, 56, 64]; The complexity candidate list of the feature aggregation unit and the weight prediction summation unit in the frame reconstruction module is set to be [32, 40, 48, 56, 64]. Preferably, the step S2 specifically includes constructing a complexity sensing module CAConv for replacing a first scalable convolution component of all the depth coding modules with the complexity selecting function, where the complexity sensing module specifically performs the followi