CN-121998821-A - Television picture definition self-adaption method based on deep learning
Abstract
The invention relates to the field of deep learning and discloses a television picture definition self-adaption method based on deep learning, which comprises the steps of extracting a motion vector from a video decoding data stream, calculating a modular length, determining a motion suppression coefficient which is in a negative correlation mapping relation with the modular length, generating a texture energy index by utilizing a gradient statistical value of a pixel, executing negative feedback drift correction on a preset threshold interval based on a time difference value between actual reasoning time consumption and a target frame interval, comparing the texture energy index with the corrected threshold interval to generate a routing label, distributing a macro block to heterogeneous convolution branch processing, and executing weighted average fusion at a splicing boundary.
Inventors
- AN XUANLIANG
Assignees
- 广东山木电子技术有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20251204
Claims (10)
- 1. The television picture definition self-adapting method based on the deep learning is characterized by comprising the following steps: Acquiring a video frame to be processed and dividing the video frame into a plurality of macro blocks which are not overlapped with each other; Extracting a motion vector corresponding to each macro block from the video decoding data stream, and calculating a modular length of the motion vector; Calculating a pixel gradient statistic value of each macro block; determining a motion suppression coefficient for macro blocks by using a modular length, and multiplying a pixel gradient statistic value in each macro block by the motion suppression coefficient to generate a texture energy index, wherein the motion suppression coefficient and the modular length are in a negative correlation mapping relation; detecting the actual reasoning time consumption of the previous video frame in the convolutional neural network, and calculating the time difference value between the actual reasoning time consumption and the preset target frame interval; performing negative feedback drift correction on a preset threshold interval by using the time difference value to obtain a corrected threshold interval, wherein when the time difference value is greater than zero, the lower limit value of the preset threshold interval is adjusted along the value increasing direction; Comparing the texture energy index with the corrected threshold interval to generate a routing label, and distributing the macro block to the heterogeneous convolution branch for processing according to the routing label, wherein the routing label indicates that the macro block is distributed to the deep convolution branch when the texture energy index is higher than the corrected threshold interval lower limit value; Identifying the types of the heterogeneous convolution branches to which the adjacent macro blocks belong respectively at the splicing boundaries of the adjacent macro blocks, and executing weighted average fusion in the boundary overlapping domain with preset pixel width when the adjacent macro blocks belong to different types of heterogeneous convolution branches, and outputting the enhanced video frame.
- 2. The adaptive method of television picture sharpness based on deep learning according to claim 1, wherein determining the motion suppression coefficient for the macro block by using the modulo length comprises setting a rest threshold and a cut-off threshold, setting the motion suppression coefficient as a reference value when the modulo length is smaller than the rest threshold, setting the motion suppression coefficient as a minimum cut-off value when the modulo length is larger than the cut-off threshold, and calculating the motion suppression coefficient k according to a nonlinear attenuation function when the modulo length is between the rest threshold and the cut-off threshold, wherein the nonlinear attenuation function satisfies k=1/(1+λ) and (|v| -v th ) β ), wherein|v| is the modulo length, v th is the rest threshold, λ is an adjustment factor, and β is an attenuation index.
- 3. The adaptive method of television picture definition based on deep learning according to claim 1, wherein the performing of negative feedback drift correction on the preset threshold interval by using the time difference value comprises establishing a controller logic including a proportional link and an integral link, using the time difference value as an input error signal of the controller logic, calculating a weighted sum of a proportional term output and an integral term output of the input error signal to obtain a threshold correction amount, superposing an original lower limit value of the preset threshold interval with the threshold correction amount to generate a corrected threshold interval, and triggering calculation of the threshold correction amount when an absolute value of the time difference value exceeds a preset jitter tolerance by the controller logic including a dead zone limiting rule.
- 4. The adaptive method of video sharpness based on deep learning as set forth in claim 1, wherein the step of multiplying the motion suppression coefficient by the pixel gradient statistics in each macroblock to generate the texture energy index further comprises calculating gradient components of the macroblock in the horizontal direction and the vertical direction by Sobel operator, respectively, calculating the average value of the sum of absolute values of the gradient components, and determining the average value as the pixel gradient statistics.
- 5. The adaptive method for television picture definition based on deep learning according to claim 1, wherein the method for television picture definition based on deep learning is characterized by distributing macro blocks to heterogeneous convolution branches according to routing labels for processing, and specifically comprises the steps of constructing a static computation graph of a convolution neural network, wherein the static computation graph comprises parallel straight-through branches, shallow residual branches and deep dense connection branches, reading the routing labels by using a logic control unit, activating the straight-through branches or the shallow residual branches when the routing labels indicate to enter bypass branches or shallow convolution branches, and shielding computation operations of the deep dense connection branches, and activating the deep dense connection branches when the routing labels indicate to enter deep convolution branches.
- 6. The adaptive method of television picture definition based on deep learning according to claim 1, wherein the step of comparing the texture energy index with the corrected threshold interval to generate the routing label comprises the steps of executing time sequence hysteresis determination logic, calculating a difference value between the texture energy index of the current macro block and the texture energy index of the macro block at the position corresponding to the previous video frame, and if the absolute value of the difference value is within a preset hysteresis interval, forcedly keeping the routing label consistent with the label at the position corresponding to the previous video frame, and ignoring the instant comparison result of the texture energy index and the corrected threshold interval.
- 7. The adaptive method for television picture definition based on deep learning according to claim 1, wherein the weighted average fusion is performed in a boundary overlap domain of a preset pixel width, and specifically comprises defining a fusion weight matrix of the boundary overlap domain, wherein element values in the fusion weight matrix are linearly and gradually distributed along with a pixel distance from a spliced boundary, and weighting and summing pixel values of adjacent macro blocks in the boundary overlap domain by using the fusion weight matrix to generate a smoothed boundary pixel value.
- 8. The adaptive method of television picture sharpness based on deep learning according to claim 1, further comprising downsampling a video frame to be processed to generate a global thumbnail, calculating an average texture intensity of the global thumbnail, and overall shifting a reference value of a preset threshold interval according to the average texture intensity, wherein the preset threshold interval is overall shifted in a numerical increasing direction when the average texture intensity indicates a high noise scene.
- 9. The adaptive method of deep learning based television picture sharpness of claim 5, wherein deep dense connection branches include a plurality of cascaded residual dense blocks, each residual dense block containing a feature multiplexing path directly connected to all subsequent layers, and shallow residual branches include a single convolutional layer or less than three cascaded residual blocks.
- 10. The television picture definition self-adaptation method based on deep learning according to claim 1, wherein detecting actual reasoning time consumption of a previous video frame in a convolutional neural network specifically comprises recording a first hardware time stamp at an input node of the convolutional neural network, recording a second hardware time stamp at a moment when data write-back is completed at an output node of the convolutional neural network, and calculating a difference value between the second hardware time stamp and the first hardware time stamp as actual reasoning time consumption.
Description
Television picture definition self-adaption method based on deep learning Technical Field The invention relates to a television picture definition self-adaption method based on deep learning, and belongs to the technical field of deep learning. Background In the current ultra-high definition video display technology, a video super-resolution reconstruction model based on a depth convolution neural network is a core means for improving image quality, the model usually adopts an end-to-end nonlinear mapping architecture, a large number of convolution layers are stacked to fit inverse mapping in an image degradation process, high-frequency detail information of an image is restored, a fixed static calculation graph structure is adopted in the existing main stream super-resolution network reasoning stage, no matter whether a local area of an input video frame is a low information entropy flat background or a texture dense high-frequency edge, the model executes an identical convolution operation sequence, the spatial uniformity calculation mechanism ignores significant non-stationarity characteristics of video signals, a large number of common television pictures only need shallow linear interpolation to restore the low-frequency area, and full depth reasoning leads to serious mismatch of spatial distribution of calculation resources. Under the condition that computing resources and thermal design power consumption of a system on a television receiving terminal are strictly limited, computing power and information distribution mismatch cause engineering bottlenecks, when a high-frame-rate or high-resolution video stream is processed, continuous high-load reasoning causes instantaneous computing power overdrawing of the system, frame rate shaking or overheat frequency reduction is caused, the problem is solved, the industry tries to introduce evaluation mechanism auxiliary processing, the effect is limited to a static view angle, for example, the patent publication definition prediction model training method and the definition grade determining method of the Chinese patent of the authority bulletin No. CN113362304B are limited, the scheme utilizes a twin network to obtain image prediction definition and determine definition grade, the intrinsic emphasis still image or key frame content is evaluated by momentum, the perception passivation characteristic under the high-speed motion scene of a human eye vision system is not considered based on the unidirectional prediction logic of the content, the instantaneous hardware throughput capability of a decoding terminal is not perceived in real time, the technology only solves the problem of multi-definition discrimination of pictures, the closed-loop control relation between the definition evaluation and the underlying computing power distribution is not established, and the dynamic balance of image quality and fluency is realized by discarding high-frequency calculation when the computing power fluctuation is difficult. Therefore, how to construct a dynamic recombination reasoning path adaptive computing mechanism for perceiving the real-time load state of hardware according to the local texture characteristics of an input video signal and the motion information of a decoding domain becomes the technical problem to be solved by the invention. Disclosure of Invention In order to solve the problems in the background technology, the technical scheme of the invention is as follows, a television picture definition self-adaptive method based on deep learning comprises the following steps: Acquiring a video frame to be processed and dividing the video frame into a plurality of macro blocks which are not overlapped with each other; Extracting a motion vector corresponding to each macro block from the video decoding data stream, and calculating a modular length of the motion vector; Calculating a pixel gradient statistic value of each macro block; determining a motion suppression coefficient for macro blocks by using a modular length, and multiplying a pixel gradient statistic value in each macro block by the motion suppression coefficient to generate a texture energy index, wherein the motion suppression coefficient and the modular length are in a negative correlation mapping relation; detecting the actual reasoning time consumption of the previous video frame in the convolutional neural network, and calculating the time difference value between the actual reasoning time consumption and the preset target frame interval; performing negative feedback drift correction on a preset threshold interval by using the time difference value to obtain a corrected threshold interval, wherein when the time difference value is greater than zero, the lower limit value of the preset threshold interval is adjusted along the value increasing direction; Comparing the texture energy index with the corrected threshold interval to generate a routing label, and distributing the macro block to the he