CN-122023543-A - Method, device, equipment and medium for determining Chinese character color in image
Abstract
The invention provides a method, a device, equipment and a medium for determining the colors of characters in an image, which comprises the steps of converting an input image into numpy arrays of RGB channels, obtaining the height H_img and the width W_img of the image, positioning a character area of the input image, analyzing an OCR result containing character position information, wherein each character in the OCR result corresponds to a quadrilateral vertex coordinate list, determining an initial boundary box of the character area by extracting the minimum value and the maximum value of vertex coordinates, limiting the coordinate range of the initial boundary box within the height H_img and the width W_img to obtain a target boundary box of each character area, determining the background color of the character area by adopting a three-level estimation strategy aiming at each target boundary box, determining the font color of the character based on the background color and combining the color difference and gradient characteristics, and ensuring the accuracy of the determined font color.
Inventors
- LIU ZHIHAI
- YANG JINHE
- TONG ZHEN
Assignees
- 福建紫讯信息科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20251223
Claims (10)
- 1.A method for determining the color of a Chinese character in an image is characterized by comprising the following steps: Step 1, converting an input image into numpy arrays of RGB channels, and acquiring the height H_img and the width W_img of the image; Step 2, positioning a text region of an input image, analyzing an OCR result containing text position information, wherein each text in the OCR result corresponds to a quadrilateral vertex coordinate list, determining an initial boundary frame of the text region by extracting the minimum value and the maximum value of vertex coordinates, and limiting the coordinate range of the initial boundary frame within the height H_img and the width W_img to obtain a target boundary frame of each text region; step 3, determining background colors of the text areas by adopting a three-level estimation strategy aiming at each target boundary box; And 4, determining the font color of the text based on the background color and combining the color difference and the gradient characteristics.
- 2. The method for determining the color of the text in the image according to claim 1, wherein the step 2 is specifically characterized in that text region positioning is carried out on the input image, an OCR result containing text position information is analyzed, an initial boundary box of the text region is determined through extracting the minimum value and the maximum value of the vertex coordinates in the OCR result, the coordinate range of the initial boundary box is limited within the height H_img and the width W_img, a target boundary box of each text region is obtained, the target boundary box contains four coordinate parameters of the determination of the text color in the minx, miny, ma image and maxy, the determination process of the target boundary box is that all x values and y values in quadrilateral vertex coordinates are extracted, and minux=max (0, int (min (xs))), miny =max (0, int (min (ys))), and the determination of the text color in the ma image is performed in min (W_img, int (max) and maxy min (H_img, int (xs)) are calculated, wherein the vertex (xs) is a set of all x coordinates, and the vertex (xs)) is a set of all x coordinates.
- 3. The method for determining color of Chinese characters in an image according to claim 1, wherein said step 3 comprises the following steps: The method comprises the steps of taking a target boundary box as a center, expanding an annular region with a first preset proportion outwards, constructing a mask for shielding an original text region, extracting effective pixels in the annular region, taking the median of the effective pixels as a background color if the number of the effective pixels is not smaller than a preset threshold value, wherein the first preset proportion is 15%, the preset threshold value is 200 pixels, the annular region is constructed in a mode of calculating an expanded boundary ex1 = max (0, mix-pad_x), ey1 = max (0, mix-pad_y), ex2 = min (W_img, determination of Chinese color in a ma image + pad_x), ey2 = min (H_img, maxy + pad_y), pad_x = max (3, int ((determination of Chinese color in a ma image-mix) 15%), pad_y = max (3, int ((maxy-miny) x 15%)), and constructing a mask within the boundary region to be a corresponding mask value of the original text region; If the effective pixels in the annular area are insufficient, extracting pixels at the top, the bottom, the left side and the right side of a text area corresponding to a target boundary frame, counting the duty ratio of bright color pixels in the boundary, and if the duty ratio is not more than a second preset proportion and the number of the boundary pixels is more than a set number, taking the median of the boundary pixels as a background color, wherein the bright color pixels are pixels with RGB three channel values being more than 240, the second preset proportion is 40%, and the number of the boundary pixels meets the requirement that the number of the boundary pixels is not less than 50; If the previous two-stage estimation fails, carrying out K-Means clustering on pixels of a text region to obtain two clusters, combining the average gradient and the pixel number of each cluster, selecting the cluster with small average gradient and large pixel number as a background cluster, and taking the median of the pixels of the background cluster as a background color, wherein the parameters of the K-Means clustering are set as n_ clusters =2, random_state=42 and n_init=10, the average gradient is obtained through Sobel gradient calculation, and the Sobel gradient is obtained through np.sqrt (gx2+gy2) after x and y direction gradients are calculated through 3 x 3 convolution kernels kx and ky respectively.
- 4. The method for determining color of Chinese characters in an image according to claim 1, wherein said step 4 is specifically: The target boundary frame is internally shrunk by a preset proportion to obtain a shrunk area, and the original target boundary frame is adopted if the shrunk area is too small, wherein the third preset proportion is 8%, and the judgment standard of the too small shrunk area is that the width or the height of the shrunk area is smaller than 5 pixels; The method comprises the steps of converting a background color and a shrinkage area pixel from an RGB space to an LAB color space respectively, calculating the color difference delta E between each pixel in the shrinkage area and the background color, and simultaneously calculating a gray scale image and a Sobel gradient of the shrinkage area, wherein the conversion process from the RGB space to the LAB space comprises the steps of firstly converting an RGB value into an sRGB linear space through a_ sRGB _to_linear function, converting into an XYZ space through a_rgb_to_xyz function, and finally converting into the LAB space through a_xyz_to_lab function, wherein the_ sRGB _to_linear function adopts a piecewise function to realize conversion, the_rgb_to_xyz function adopts an sRGB D65 standard conversion matrix, and the_xyz_to_lab function adopts a D65 standard light source parameter; Determining a screening threshold value based on delta E and the quantile of the gradient, screening candidate pixels meeting a threshold value condition, and reducing the threshold value or selecting Top-K pixels according to a weighting score if the number of the candidate pixels is insufficient, wherein the screening threshold value is determined in a manner of :dE_thr=max(10.0,scoreatpercentile(delta_e.flatten(),70)),g_thr=max(0.06,scoreatpercentile(grad.flatten(),70)),, the weighting score is score=delta_e+grad multiplied by 8.0, the K value in the Top-K is max (8, int (patch.size/3 multiplied by 0.01)), and the patch is a pixel array corresponding to a contraction region; K-Means clustering is carried out on candidate pixels, a comprehensive score is calculated by combining the color difference, the brightness difference, the gradient and the cluster size of each cluster, the cluster with the highest score is selected as a font cluster, the median of the pixels of the font cluster is taken as the font color, the number K of the K-Means clusters is set to be k=3 when the number of the candidate pixels is larger than 80, otherwise k=2, the calculation formula of the comprehensive score is score=m_dE×1.8+dL×1.2+m_grad×2.5-area×3.0, m_dE is the average delta E of the pixels in the cluster and the background color, dL is the brightness difference of the center of the cluster and the background color, m_grad is the average gradient of the pixels in the cluster, and area is the proportion of the number of the pixels in the cluster to the total number of the candidate pixels.
- 5. A device for determining the color of a Chinese character in an image is characterized by comprising: the format conversion module converts an input image into numpy arrays of RGB channels and acquires the height H_img and the width W_img of the image; The boundary frame extraction module is used for carrying out text region positioning on an input image, analyzing an OCR result containing text position information, wherein each text in the OCR result corresponds to a quadrilateral vertex coordinate list, determining an initial boundary frame of a text region by extracting the minimum value and the maximum value of vertex coordinates, and limiting the coordinate range of the initial boundary frame within the height H_img and the width W_img to obtain a target boundary frame of each text region; the background color estimation module is used for determining the background color of the text area by adopting a three-level estimation strategy aiming at each target boundary box; And the font color calculating module is used for determining the font color of the text by combining the color difference and the gradient characteristic based on the background color.
- 6. The device for determining the color of the text in the image according to claim 5, wherein the boundary box extraction module is specifically configured to locate text regions of the input image, analyze OCR results containing text position information, determine an initial boundary box of the text regions by extracting minimum and maximum values of vertex coordinates in the OCR results, limit a coordinate range of the initial boundary box within the height h_img and the width w_img to obtain a target boundary box of each text region, wherein the target boundary box contains four coordinate parameters of the determination of the color of the text in the minx, miny, ma image and maxy, and the determination process of the target boundary box is to extract all x values and y values in quadrilateral vertex coordinates, calculate minux=max (0, int (min (xs))), miny =max (0, int (min (ys))), and determine the color of the text in the ma image=min (w_img), int (max (xs)), maxy =min (h_img), wherein the vertex coordinates of the vertex coordinates are all sets of the vertex coordinates.
- 7. The apparatus according to claim 5, wherein the background color estimation module is specifically configured to: The method comprises the steps of taking a target boundary box as a center, expanding an annular region with a first preset proportion outwards, constructing a mask for shielding an original text region, extracting effective pixels in the annular region, taking the median of the effective pixels as a background color if the number of the effective pixels is not smaller than a preset threshold value, wherein the first preset proportion is 15%, the preset threshold value is 200 pixels, the annular region is constructed in a mode of calculating an expanded boundary ex1 = max (0, mix-pad_x), ey1 = max (0, mix-pad_y), ex2 = min (W_img, determination of Chinese color in a ma image + pad_x), ey2 = min (H_img, maxy + pad_y), pad_x = max (3, int ((determination of Chinese color in a ma image-mix) 15%), pad_y = max (3, int ((maxy-miny) x 15%)), and constructing a mask within the boundary region to be a corresponding mask value of the original text region; If the effective pixels in the annular area are insufficient, extracting pixels at the top, the bottom, the left side and the right side of a text area corresponding to a target boundary frame, counting the duty ratio of bright color pixels in the boundary, and if the duty ratio is not more than a second preset proportion and the number of the boundary pixels is more than a set number, taking the median of the boundary pixels as a background color, wherein the bright color pixels are pixels with RGB three channel values being more than 240, the second preset proportion is 40%, and the number of the boundary pixels meets the requirement that the number of the boundary pixels is not less than 50; If the previous two-stage estimation fails, carrying out K-Means clustering on pixels of a text region to obtain two clusters, combining the average gradient and the pixel number of each cluster, selecting the cluster with small average gradient and large pixel number as a background cluster, and taking the median of the pixels of the background cluster as a background color, wherein the parameters of the K-Means clustering are set as n_ clusters =2, random_state=42 and n_init=10, the average gradient is obtained through Sobel gradient calculation, and the Sobel gradient is obtained through np.sqrt (gx2+gy2) after x and y direction gradients are calculated through 3 x 3 convolution kernels kx and ky respectively.
- 8. The device for determining color of Chinese character in image according to claim 5, wherein said font color calculating module comprises: The target boundary frame is internally shrunk by a preset proportion to obtain a shrunk area, and the original target boundary frame is adopted if the shrunk area is too small, wherein the third preset proportion is 8%, and the judgment standard of the too small shrunk area is that the width or the height of the shrunk area is smaller than 5 pixels; The method comprises the steps of converting a background color and a shrinkage area pixel from an RGB space to an LAB color space respectively, calculating the color difference delta E between each pixel in the shrinkage area and the background color, and simultaneously calculating a gray scale image and a Sobel gradient of the shrinkage area, wherein the conversion process from the RGB space to the LAB space comprises the steps of firstly converting an RGB value into an sRGB linear space through a_ sRGB _to_linear function, converting into an XYZ space through a_rgb_to_xyz function, and finally converting into the LAB space through a_xyz_to_lab function, wherein the_ sRGB _to_linear function adopts a piecewise function to realize conversion, the_rgb_to_xyz function adopts an sRGB D65 standard conversion matrix, and the_xyz_to_lab function adopts a D65 standard light source parameter; Determining a screening threshold value based on delta E and the quantile of the gradient, screening candidate pixels meeting a threshold value condition, and reducing the threshold value or selecting Top-K pixels according to a weighting score if the number of the candidate pixels is insufficient, wherein the screening threshold value is determined in a manner of :dE_thr=max(10.0,scoreatpercentile(delta_e.flatten(),70)),g_thr=max(0.06,scoreatpercentile(grad.flatten(),70)),, the weighting score is score=delta_e+grad multiplied by 8.0, the K value in the Top-K is max (8, int (patch.size/3 multiplied by 0.01)), and the patch is a pixel array corresponding to a contraction region; K-Means clustering is carried out on candidate pixels, a comprehensive score is calculated by combining the color difference, the brightness difference, the gradient and the cluster size of each cluster, the cluster with the highest score is selected as a font cluster, the median of the pixels of the font cluster is taken as the font color, the number K of the K-Means clusters is set to be k=3 when the number of the candidate pixels is larger than 80, otherwise k=2, the calculation formula of the comprehensive score is score=m_dE×1.8+dL×1.2+m_grad×2.5-area×3.0, m_dE is the average delta E of the pixels in the cluster and the background color, dL is the brightness difference of the center of the cluster and the background color, m_grad is the average gradient of the pixels in the cluster, and area is the proportion of the number of the pixels in the cluster to the total number of the candidate pixels.
- 9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 4 when the program is executed by the processor.
- 10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1 to 4.
Description
Method, device, equipment and medium for determining Chinese character color in image Technical Field The present invention relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a medium for determining a color of a Chinese character in an image. Background In the field of commodity graph processing, accurate calculation of font colors in commodity graphs is important to scenes such as commodity display, brand image maintenance, advertisement design and the like. The conventional commodity graph font color calculating method has a plurality of defects. The method is characterized in that the method comprises the steps of determining the color of a font by simply extracting the average value of pixels in a text area, wherein interference of background color on font color judgment is not considered, when the background color is similar to the font color or a complex texture background exists, the calculated color of the font is extremely low in accuracy, and the method comprises the steps of introducing color space conversion but not combining multi-dimensional information such as gradient characteristics and the like, so that the edge of the font and background noise cannot be effectively distinguished, and the calculated result of the font color is larger in deviation in a commodity graph with a blurred edge or semitransparent effect. Meanwhile, the prior art lacks of adaptability adjustment on different types of commodity graphs (such as commodity graphs with complex pattern backgrounds and photographed under different illumination conditions), and is difficult to meet the requirement of accurate calculation on the font colors of the commodity graphs in practical application, so that the efficiency and quality of commodity related design and display work are affected. Disclosure of Invention The invention aims to solve the technical problem of providing a method, a device, equipment and a medium for determining the character colors in an image, which are used for realizing the accurate calculation of the character colors in a commodity graph by combining various technical Means such as color space conversion, gradient feature extraction, K-Means cluster analysis, background color robust estimation and the like. The method can effectively eliminate background color interference, is suitable for commodity graphs with different types and different background complexity, accurately distinguishes font areas from background areas, calculates font colors meeting actual requirements, meets the requirements of commodity display, brand design, advertisement production and other fields on the accuracy of the font colors of the commodity graphs, and improves the efficiency and quality of related work. In a first aspect, the present invention provides a method for determining text color in an image, including the steps of: Step 1, converting an input image into numpy arrays of RGB channels, and acquiring the height H_img and the width W_img of the image; Step 2, positioning a text region of an input image, analyzing an OCR result containing text position information, wherein each text in the OCR result corresponds to a quadrilateral vertex coordinate list, determining an initial boundary frame of the text region by extracting the minimum value and the maximum value of vertex coordinates, and limiting the coordinate range of the initial boundary frame within the height H_img and the width W_img to obtain a target boundary frame of each text region; step 3, determining background colors of the text areas by adopting a three-level estimation strategy aiming at each target boundary box; And 4, determining the font color of the text based on the background color and combining the color difference and the gradient characteristics. In a second aspect, the present invention provides a device for determining text color in an image, including: the format conversion module converts an input image into numpy arrays of RGB channels and acquires the height H_img and the width W_img of the image; The boundary frame extraction module is used for carrying out text region positioning on an input image, analyzing an OCR result containing text position information, wherein each text in the OCR result corresponds to a quadrilateral vertex coordinate list, determining an initial boundary frame of a text region by extracting the minimum value and the maximum value of vertex coordinates, and limiting the coordinate range of the initial boundary frame within the height H_img and the width W_img to obtain a target boundary frame of each text region; the background color estimation module is used for determining the background color of the text area by adopting a three-level estimation strategy aiming at each target boundary box; And the font color calculating module is used for determining the font color of the text by combining the color difference and the gradient characteristic bas