US-12627792-B2 - Screen content processing method and apparatus, and device

US12627792B2US 12627792 B2US12627792 B2US 12627792B2US-12627792-B2

Abstract

A screen content processing method includes: dividing screen content into a plurality of areas; detecting pixel similarity between a first target area of a current frame of the screen content and a second target area of a previous frame of the screen content; using a pixel hash table of the second target area as a pixel hash table of the first target area if the similarity satisfies a first detection result; calculating pixel hash values of the first target area to establish a pixel hash table if the similarity satisfies a second detection result; traversing, according to a pixel hash table of an area to be coded in the first target area of the current frame, pixel hash tables of reference areas and performing intra block copy processing or hash motion estimation processing, to complete screen content processing for the first target area of the current frame.

Inventors

LINGYU LI
Yue Wang
Li Zhang

Assignees

BEIJING BYTEDANCE NETWORK TECHNOLOGY CO., LTD.

Dates

Publication Date: 20260512
Application Date: 20220126
Priority Date: 20210205

Claims (15)

1 . A screen content processing method, comprising: dividing screen content into a plurality of areas; detecting pixel similarity between a first target area of a current frame of the screen content and a second target area of a previous frame of the screen content, wherein the first target area and the second target area are areas corresponding to each other in the screen content; using a pixel hash table of the second target area of the previous frame as a pixel hash table of the first target area of the current frame, if the similarity satisfies a first detection result; calculating pixel hash values of the first target area of the current frame to establish the pixel hash table of the first target area of the current frame, if the similarity satisfies a second detection result; and traversing, according to a pixel hash table of an area to be coded in the first target area of the current frame, pixel hash tables of reference areas and performing intra block copy processing or hash motion estimation processing on the area to be coded, to complete screen content processing of the first target area of the current frame; wherein the traversing, according to the pixel hash table of the area to be coded in the first target area of the current frame, the pixel hash tables of the reference areas and performing the intra block copy processing or the hash motion estimation processing on the area to be coded comprises: traversing, according to the pixel hash table of the area to be coded in the first target area of the current frame, in an order from near to far from the area to be coded, the pixel hash tables of the reference areas to obtain a target reference block; and performing, according to the target reference block, the intra block copy processing or the hash motion estimation processing on the area to be coded in the first target area of the current frame.
2 . The method according to claim 1 , wherein the detecting the pixel similarity between the first target area of the current frame of the screen content and the second target area of the previous frame of the screen content comprises: obtaining pixel values of the first target area of the current frame and pixel values of the second target area of the previous frame; and calculating, according to a preset algorithm, similarity between pixels of the first target area of the current frame and pixels of the second target area of the previous frame, according to the pixel values of the first target area of the current frame and the pixel values of the second target area of the previous frame.
3 . The method according to claim 2 , wherein the preset algorithm comprises at least one of the following: sum of absolute difference, sum of absolute transformed difference, and sum of squared error.
4 . The method according to claim 2 , wherein after the traversing, according to the pixel hash table of the area to be coded in the first target area of the current frame, the pixel hash tables of the reference areas and performing the intra block copy processing or the hash motion estimation processing on the area to be coded, the method further comprises: judging whether or not the first target area is a last area of the plurality of areas of the divided screen content; if yes, completing processing on the current frame of the screen content; if not, continuing to perform the step of detecting pixel similarity between a next area of the first target area of the current frame of the screen content and a same area corresponding to the next area of the previous frame.
5 . The method according to claim 2 , wherein the calculating the pixel hash values of the first target area of the current frame to establish the pixel hash table of the first target area of the current frame comprises: calculating a hash value corresponding to each pixel position in the first target area of the current frame; and establishing the pixel hash table with the hash value of each pixel position in the first target area of the current frame as a value of “key” and each pixel position as a value of “value”.
6 . The method according to claim 2 , wherein the similarity satisfying the first detection result means that the similarity is greater than a defined similarity threshold; the similarity satisfying the second detection result means that the similarity is less than or equal to the defined similarity threshold.
7 . The method according to claim 1 , wherein after the traversing, according to the pixel hash table of the area to be coded in the first target area of the current frame, the pixel hash tables of the reference areas and performing the intra block copy processing or the hash motion estimation processing on the area to be coded, the method further comprises: judging whether or not the first target area is a last area of the plurality of areas of the divided screen content; if yes, completing processing on the current frame of the screen content; if not, continuing to perform the step of detecting pixel similarity between a next area of the first target area of the current frame of the screen content and a same area corresponding to the next area of the previous frame.
8 . The method according to claim 7 , wherein the calculating the pixel hash values of the first target area of the current frame to establish the pixel hash table of the first target area of the current frame comprises: calculating a hash value corresponding to each pixel position in the first target area of the current frame; and establishing the pixel hash table with the hash value of each pixel position in the first target area of the current frame as a value of “key” and each pixel position as a value of “value”.
9 . The method according to claim 7 , wherein the similarity satisfying the first detection result means that the similarity is greater than a defined similarity threshold; the similarity satisfying the second detection result means that the similarity is less than or equal to the defined similarity threshold.
10 . The method according to claim 1 , wherein the calculating the pixel hash values of the first target area of the current frame to establish the pixel hash table of the first target area of the current frame comprises: calculating a hash value corresponding to each pixel position in the first target area of the current frame; and establishing the pixel hash table with the hash value of each pixel position in the first target area of the current frame as a value of “key” and each pixel position as a value of “value”.
11 . The method according to claim 1 , wherein the similarity satisfying the first detection result means that the similarity is greater than a defined similarity threshold; the similarity satisfying the second detection result means that the similarity is less than or equal to the defined similarity threshold.
12 . The method according to claim 11 , further comprising: determining the similarity threshold according to a size of the first target area.
13 . The method according to claim 1 , wherein the dividing the screen content into the plurality of areas comprises: dividing the screen content into N×N rectangular areas, wherein N is a positive integer.
14 . An electronic device, comprising: a processor and a memory; wherein the memory stores a computer-executed instruction; and the processor executes the computer-executed instruction stored in the memory, to cause the processor to divide screen content into a plurality of areas; detect pixel similarity between a first target area of a current frame of the screen content and a second target area of a previous frame of the screen content, wherein the first target area and the second target area are areas corresponding to each other in the screen content; use a pixel hash table of the second target area of the previous frame as a pixel hash table of the first target area of the current frame, if the similarity satisfies a first detection result; calculate pixel hash values of the first target area of the current frame to establish the pixel hash table of the first target area of the current frame, if the similarity satisfies a second detection result; and traverse, according to a pixel hash table of an area to be coded in the first target area of the current frame, pixel hash tables of reference areas and perform intra block copy processing or hash motion estimation processing on the area to be coded, to complete screen content processing of the first target area of the current frame; wherein the processor is further caused to: traverse, according to the pixel hash table of the area to be coded in the first target area of the current frame, in an order from near to far from the area to be coded, the pixel hash tables of the reference areas to obtain a target reference block; and perform, according to the target reference block, the intra block copy processing or the hash motion estimation processing on the area to be coded in the first target area of the current frame.
15 . A non-transitory computer-readable storage medium, wherein a computer-executed instruction is stored in the computer-readable storage medium, and when the computer-executed instruction is executed by a processor, following steps are implemented: dividing screen content into a plurality of areas; detecting pixel similarity between a first target area of a current frame of the screen content and a second target area of a previous frame of the screen content, wherein the first target area and the second target area are areas corresponding to each other in the screen content; using a pixel hash table of the second target area of the previous frame as a pixel hash table of the first target area of the current frame, if the similarity satisfies a first detection result; calculating pixel hash values of the first target area of the current frame to establish the pixel hash table of the first target area of the current frame, if the similarity satisfies a second detection result; and traversing, according to a pixel hash table of an area to be coded in the first target area of the current frame, pixel hash tables of reference areas and performing intra block copy processing or hash motion estimation processing on the area to be coded, to complete screen content processing of the first target area of the current frame; wherein the traversing, according to the pixel hash table of the area to be coded in the first target area of the current frame, the pixel hash tables of the reference areas and performing the intra block copy processing or the hash motion estimation processing on the area to be coded comprises: traversing, according to the pixel hash table of the area to be coded in the first target area of the current frame, in an order from near to far from the area to be coded, the pixel hash tables of the reference areas to obtain a target reference block; and performing, according to the target reference block, the intra block copy processing or the hash motion estimation processing on the area to be coded in the first target area of the current frame.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a National Stage of International Application No. PCT/CN2022/074131, filed on Jan. 26, 2022, which claims priority to Chinese patent application No. 202110164130.9, entitled “SCREEN CONTENT PROCESSING METHOD AND APPARATUS, AND DEVICE” and filed with the China National Intellectual Property Administration on Feb. 5, 2021. Both of the above applications are incorporated herein by reference in their entireties. TECHNICAL FIELD Embodiments of the present disclosure relate to the field of communication and computer technologies, and in particular, to a screen content processing method and apparatus, and a device. BACKGROUND H.265 is also called high efficiency video coding (HEVC for short), which are hereinafter collectively referred to as H.265. In the screen content coding (SCC for short) technology of H.265, two coding tools, intra block copy (IBC for short) and hash motion estimation (Hash ME for short), have been added in view of characteristics such as zero noise of screen content and repetition of graphics and texts. There are many identical text symbols in different frames of the screen content, and the screen content is noiseless, so hash values of identical text symbol content in different frames are exactly the same. At present, in existing processes of coding screen content by IBC and Hash ME, a hash value corresponding to each pixel position in a frame of the screen content is always required to form a hash table (correspondence relationship between a pixel position and a hash value). The principle of screen content coding by IBC is to divide a current frame of the screen content into multiple coding units of which each includes multiple pixel blocks, find a best matching reference block according to a hash table of the current frame, and predict pixel values of a current block according to pixels of the reference block. The principle of screen content coding by Hash ME is to divide a current frame of the screen content into multiple coding units which each include multiple pixel blocks, find, from a hash table of a reference frame and according to a hash value of a current block, a best matching reference block, and predict pixel values of the current block according to pixels of the reference block. However, the inventors find that the existing modes of screen content coding by IBC or Hash ME are all implemented by searching for pixels of an entire frame of the screen content based on hash values, so the hash value needs to be calculated for each pixel of each frame of the screen content, causing a very large amount of calculation, and thus affecting the processing efficiency of screen content coding. SUMMARY Embodiments of the present disclosure provide a screen content processing method and apparatus, and a device, which can reduce redundant hash calculation, increase the coding speed of screen content coding, and improve the processing efficiency of screen content coding. In a first aspect, an embodiment of the present disclosure provides a screen content processing method, including: dividing screen content into a plurality of areas;detecting pixel similarity between a first target area of a current frame of the screen content and a second target area of a previous frame of the screen content, where the first target area and the second target area are areas corresponding to each other in the screen content;using a pixel hash table of the second target area of the previous frame as a pixel hash table of the first target area of the current frame, if the similarity satisfies a first detection result;calculating pixel hash values of the first target area of the current frame to establish the pixel hash table of the first target area of the current frame, if the similarity satisfies a second detection result; andtraversing, according to a pixel hash table of an area to be coded in the first target area of the current frame, pixel hash tables of reference areas and performing intra block copy processing or hash motion estimation processing on the area to be coded, to complete screen content processing of the first target area of the current frame. In a second aspect, an embodiment of the present disclosure provides a screen content processing apparatus, including: a dividing module, configured to divide screen content into a plurality of areas;a judging module, configured to detect pixel similarity between a first target area of a current frame of the screen content and a second target area of a previous frame of the screen content, where the first target area and the second target area are areas corresponding to each other in the screen content;a reusing module, configured to use a pixel hash table of the second target area of the previous frame as a pixel hash table of the first target area of the current frame, if the similarity satisfies a first detection result;a calculating module, configured to calculate pixel hash values of the fir