CN-121985145-A - Image decoding method, device, equipment and storage medium
Abstract
The disclosure provides an image decoding method, an image decoding device, image decoding equipment and a storage medium, and relates to the technical field of computers. The method comprises the steps of analyzing image data to be decoded by a CPU to obtain a plurality of code blocks, transmitting the plurality of code blocks to a GPU, wherein the image data to be decoded is obtained in a wavelet transform coding mode, performing entropy decoding processing on the plurality of code blocks by the GPU in parallel through a plurality of thread blocks to obtain sub-band coefficients, performing inverse wavelet transform on the sub-band coefficients by the GPU to obtain wavelet recovery data, and performing pixel recovery processing on the wavelet recovery data by the GPU to obtain decoded data. According to the scheme, the decoding efficiency and the resource utilization rate of image decoding can be improved by completing code block analysis in the CPU and performing entropy decoding and inverse wavelet transformation in the GPU in parallel.
Inventors
- Request for anonymity
Assignees
- 摩尔线程智能科技(北京)股份有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20251223
Claims (16)
- 1. An image decoding method, applied to an image decoding system including a central processing unit CPU and a graphics processing unit GPU, comprising: The CPU analyzes the image data to be decoded to obtain a plurality of code blocks, and transmits the code blocks to the GPU, wherein the image data to be decoded is obtained in a wavelet transform coding mode; The GPU performs entropy decoding processing on the plurality of code blocks in parallel through a plurality of thread blocks to obtain sub-band coefficients; the GPU performs inverse wavelet transformation on the sub-band coefficients to obtain wavelet recovery data; And the GPU performs pixel recovery processing on the wavelet restored data to obtain decoded data.
- 2. The image decoding method according to claim 1, wherein the entropy decoding process is performed on the plurality of code blocks in parallel by a plurality of thread blocks to obtain sub-band coefficients, comprising: Assigning the plurality of code blocks to the plurality of thread blocks; Dividing bit plane data in the code block into a plurality of threads in a corresponding thread block; And loading decoding state data for context modeling into a shared memory of the thread block, and performing entropy decoding processing on the bit plane data by the threads in parallel based on the decoding state data to obtain the sub-band coefficients.
- 3. The image decoding method according to claim 2, wherein loading the decoding status data for context modeling into the shared memory of the thread block comprises: Reading the significance state data, the symbol state data and the amplitude refinement state data from the code blocks, and loading the significance state data, the symbol state data and the amplitude refinement state data into a shared memory of a thread block corresponding to the code blocks so that the thread block can perform context modeling in the entropy decoding process; The saliency state data represents the mark information of whether each coefficient in the code block reaches saliency or not, the sign state data represents the mark information of positive and negative attributes of each coefficient sign in the code block, and the amplitude refinement state data represents the mark information of bit analysis condition of each coefficient amplitude in the code block in the refinement process.
- 4. The image decoding method according to claim 2, wherein the entropy decoding processing of the bit-plane data based on the decoding status data by the plurality of threads in parallel, to obtain the subband coefficients, comprises: reading the decoding status data from the shared memory through the plurality of threads respectively; Performing context modeling based on the decoding state data by utilizing each thread, and performing bit value analysis on each bit position of the bit plane data in parallel in an entropy decoding channel according to the result of the context modeling; and generating the sub-band coefficient based on the result of the bit value analysis.
- 5. The image decoding method of claim 1, wherein the transmitting the plurality of code blocks to the GPU comprises: distributing corresponding video memories in the GPU according to the sizes of the code blocks; and transmitting the plurality of code blocks to the video memory in an asynchronous data transmission mode.
- 6. The image decoding method according to claim 5, wherein writing the plurality of code blocks into the video memory by asynchronous data transmission means comprises: Writing the plurality of code blocks into a fixed page memory to obtain fixed page code block data; and transmitting the fixed page number block data to the video memory in an asynchronous data transmission mode, and storing the fixed page number block data in the video memory in a block storage mode.
- 7. The image decoding method according to claim 1, wherein said inverse wavelet transforming the subband coefficients comprises: Performing inverse quantization processing on the subband coefficients based on quantization step sizes corresponding to the subband coefficients in the case where the wavelet transform coding is lossy coding; loading the sub-band coefficients after inverse quantization from the video memory to the shared memory; and respectively performing inverse wavelet transformation on the sub-band coefficients after inverse quantization in the shared memory in the row direction and the column direction by using different thread blocks.
- 8. The method of image decoding according to claim 7, wherein loading the dequantized subband coefficients from the video memory to the shared memory comprises: Dividing the sub-band coefficient after the inverse quantization into a row direction coefficient and a column direction coefficient; and correspondingly loading the row direction coefficient and the column direction coefficient in the video memory to the shared memory respectively.
- 9. The image decoding method according to claim 8, wherein subband coefficients corresponding to one code block are processed by two thread blocks, respectively, the two thread blocks including a first thread block and a second thread block; the inverse wavelet transform is performed on the sub-band coefficients after the inverse quantization in the shared memory by using different thread blocks in the row direction and the column direction, and the method comprises the following steps: Performing inverse wavelet transform on the line direction coefficients in the shared memory by using the first thread block to obtain a line direction transform result; And performing inverse wavelet transform of column direction on the row direction transformation result and the column direction coefficient by using the second thread block.
- 10. The image decoding method according to claim 1, wherein the performing pixel recovery processing on the wavelet restored data to obtain decoded data includes: determining the number of thread blocks required by the pixel recovery processing based on the image width corresponding to the wavelet recovery data and the number of threads of a single thread block in the GPU in the row direction; Determining a pixel recovery thread block based on the thread block number; And calling a pixel recovery kernel function required by the pixel recovery processing, and executing the pixel recovery kernel function on threads in the pixel recovery thread block in parallel to perform the pixel recovery processing on the wavelet recovery data, wherein the processing object of the threads is a pixel column of the wavelet recovery data.
- 11. The image decoding method according to claim 10, wherein the determining the number of thread blocks required for the pixel restoration processing based on the image width corresponding to the wavelet restored data and the number of threads of a single thread block in the GPU in the row direction includes: Determining the total number of pixel columns to be subjected to pixel recovery processing based on the image width corresponding to the wavelet recovery data; Determining the number of coverage columns of the single thread blocks based on the number of threads of the single thread blocks in the GPU in the row direction; The number of thread blocks required for the pixel recovery process is determined based on the calculation of the total number of pixel columns divided by the number of covered columns.
- 12. The image decoding method of claim 10, wherein the pixel recovery process comprises one or more of a color space transform, a direct current component translation transform, and a storage format transform, the method further comprising: If the pixel recovery processing includes at least one processing operation, the at least one processing operation is performed by the pixel recovery kernel during a single kernel execution.
- 13. The image decoding method according to claim 1, characterized in that the method further comprises: Transmitting the decoded data from the GPU back to the CPU; in the CPU, the decoded data is encapsulated based on a target output format.
- 14. An image decoding apparatus, characterized in that the apparatus comprises: The data analysis module is used for analyzing the image data to be decoded to obtain a plurality of code blocks, and transmitting the code blocks to the entropy decoding module, wherein the image data to be decoded is obtained in a wavelet transformation coding mode; the entropy decoding module is used for performing entropy decoding processing on the plurality of code blocks through a plurality of thread blocks in parallel to obtain sub-band coefficients; The wavelet transformation module is used for carrying out inverse wavelet transformation on the subband coefficients to obtain wavelet recovery data; And the pixel recovery module is used for carrying out pixel recovery processing on the wavelet recovery data to obtain decoded data.
- 15. An image decoding apparatus, characterized by comprising: A processor including a CPU and a GPU, and A memory for storing executable instructions of the processor; Wherein the processor is configured to perform the image decoding method of any of claims 1-13 via execution of the executable instructions.
- 16. A computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the image decoding method of any of claims 1-13.
Description
Image decoding method, device, equipment and storage medium Technical Field The present disclosure relates to the field of computer technologies, and in particular, to an image decoding method, apparatus, device, and storage medium. Background In the wavelet transform-based decoding process, a plurality of data processing steps need to be sequentially performed on an image to be decoded. Under the existing computing architecture, some of the prior art generally only completes the individual decoding processes of the image to be decoded in the CPU. The manner of independently completing decoding in the CPU is simple to implement and has high versatility, but due to the limited parallel capability of the CPU, each decoding step is usually performed sequentially in a serial manner, and it is difficult to simultaneously process a plurality of data fragments, resulting in low overall decoding efficiency. Especially, when processing high resolution images, the CPU is easily limited by the arithmetic throughput capability and the memory bandwidth, so that the decoding speed is difficult to meet the real-time processing requirement, and the problem of obvious increase of decoding delay often occurs. Disclosure of Invention An object of an embodiment of the present disclosure is to provide an image decoding method, an image decoding apparatus, an image decoding device, and a computer-readable storage medium capable of improving decoding efficiency and resource utilization of image decoding by completing code block parsing in a CPU and performing entropy decoding and inverse wavelet transform in parallel in a GPU. Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure. According to a first aspect of the disclosed embodiments, an image decoding method is provided, and the image decoding method is applied to an image decoding system, wherein the image decoding system comprises a Central Processing Unit (CPU) and a Graphics Processing Unit (GPU), the method comprises the steps that the CPU analyzes image data to be decoded to obtain a plurality of code blocks, the plurality of code blocks are transmitted to the GPU, the image data to be decoded are obtained through a wavelet transform coding mode, the GPU carries out entropy decoding processing on the plurality of code blocks in parallel through a plurality of thread blocks to obtain sub-band coefficients, the GPU carries out inverse wavelet transform on the sub-band coefficients to obtain wavelet recovery data, and the GPU carries out pixel recovery processing on the wavelet recovery data to obtain decoded data. In some example embodiments of the present disclosure, based on the foregoing scheme, the performing entropy decoding processing on the plurality of code blocks in parallel by the plurality of thread blocks to obtain sub-band coefficients includes allocating the plurality of code blocks to the plurality of thread blocks, dividing bit plane data in the code blocks into a plurality of threads in corresponding thread blocks, loading decoding status data for context modeling into a shared memory of the thread blocks, and performing entropy decoding processing on the bit plane data in parallel by the plurality of threads based on the decoding status data to obtain the sub-band coefficients. In some example embodiments of the disclosure, based on the foregoing scheme, the loading the decoding status data for context modeling into the shared memory of the thread block includes reading the significance status data, the symbol status data and the amplitude refinement status data from the code block, and loading the significance status data and the symbol status data into the shared memory of the thread block corresponding to the code block, so that the thread block can perform context modeling in the entropy decoding process, wherein the significance status data represents flag information of whether each coefficient in the code block reaches significance, the symbol status data represents flag information of positive and negative attributes of each coefficient symbol in the code block, and the amplitude refinement status data represents flag information of bit resolution of each coefficient amplitude in the code block in the refinement process. In some example embodiments of the present disclosure, based on the foregoing schemes, the performing, by the multiple threads, entropy decoding processing on the bit-plane data based on the decoding status data in parallel to obtain the subband coefficients includes reading, by the multiple threads, the decoding status data from the shared memory, respectively, performing context modeling based on the decoding status data by using the threads, and performing, in an entropy decoding channel, bit value analysis on each bit position of the bit-plane data in parallel according to a result of the context modeling, a