CN-121982134-A - Image processing method, device, equipment and storage medium

CN121982134ACN 121982134 ACN121982134 ACN 121982134ACN-121982134-A

Abstract

The disclosure provides an image processing method, an image processing device, image processing equipment and a storage medium, and relates to the field of image processing. The method comprises the steps of obtaining an image to be processed, wherein the image to be processed comprises texts, conducting multidimensional recognition processing on the texts in the image to be processed to obtain text noise characteristics of the image to be processed, obtaining text editing prompt words, conducting image denoising redrawing according to the text editing prompt words and the text noise characteristics, and generating a target image after the texts are edited. According to the method, characteristics of texts in the images to be processed can be comprehensively extracted from multiple dimensions, then text noise characteristics are obtained, the text noise characteristics are taken as the basis for noise pixel elimination, and the obtained text editing prompt words are taken as the basis for generation of redrawn contents, so that a target image with edited texts is generated.

Inventors

LIANG YUNHAO
SHI JINJIN

Assignees

北京小米移动软件有限公司

Dates

Publication Date: 20260505
Application Date: 20241029

Claims (14)

1. An image processing method, comprising: acquiring an image to be processed, wherein the image to be processed comprises a text; performing multidimensional recognition processing on the text in the image to be processed to obtain text noise characteristics of the image to be processed; acquiring text editing prompt words; And carrying out image denoising redrawing according to the text editing prompt word and the text noise characteristic, and generating the target image after the text is edited.
2. The method according to claim 1, wherein performing multidimensional recognition processing on text in the image to be processed to obtain text noise characteristics of the image to be processed comprises: Performing text recognition on the image to be processed to obtain text characteristics of the image to be processed; Performing position recognition on the text in the image to be processed to obtain a region segmentation feature map of the image to be processed; performing edge detection on the text in the image to be processed to obtain a text edge feature map of the image to be processed; and fusing the text feature, the region segmentation feature map and the text edge feature map to obtain the text noise feature of the image to be processed.
3. The method of claim 1, wherein generating the edited target image of the text by image denoising redrawing according to the text editing hint word and the text noise feature comprises: Inputting the text editing prompt words and the text noise characteristics into an image generation model, wherein the image generation model is obtained by adjusting and training an initial stable diffusion model; Semantic understanding is carried out on the text editing prompt words through the image generation model, and semantic characteristics of text editing and target text to be edited in the text are determined; And processing the image generation model based on the text noise characteristics and the semantic characteristics so as to de-noise redraw the corresponding region of the target text in the image to be processed, and generating a target image after the target text is edited.
4. A method according to claim 3, characterized in that the method further comprises: performing morphological detection on the text in the image to be processed to obtain text morphological characteristics of the image to be processed; inputting the text morphological characteristics into the image generation model; and taking the morphological characteristics of the text as guide information for denoising redrawing through the image generation model so as to generate a target image after the target text is edited.
5. The method of claim 4, wherein performing morphological detection on text in the image to be processed to obtain text morphological features of the image to be processed comprises: Extracting text outline of the text in the image to be processed to obtain a text outline extraction result; expanding the text contour extraction result to obtain a contour expansion result; and encoding the contour expansion result to obtain the morphological characteristics of the text.
6. The method of claim 3, wherein the image generation model comprises an on-cloud replacement model, an end-side replacement model, and an end-side elimination model; the on-cloud replacement model is obtained by adjusting and training the initial stable diffusion model by using a second sample image containing text; the end side replacement model is obtained by compression coding and quantizing the cloud replacement model; The end-side elimination model is obtained by distilling the initial stable diffusion model.
7. The method of claim 6, wherein the method further comprises: responding to receiving a text replacement instruction, and calling the on-cloud replacement model or the end-side replacement model to generate the target image; And in response to receiving a text elimination instruction, invoking the end-side elimination model to generate the target image.
8. The method of claim 1, wherein obtaining text editing hinting instructions comprises: Responding to the received text replacement instruction, and acquiring a text replacement prompt word input by a user to serve as the text editing prompt word; And responding to the received text elimination instruction, and acquiring a preset text elimination prompt word to serve as the text editing prompt word.
9. The method according to claim 1 or 8, characterized in that the method further comprises: Responding to the text editing prompting word as a text replacement prompting word, and generating a target image after the text is replaced, wherein the text replacement prompting word indicates text information needing to be replaced in the image to be processed; and generating a target image with the eliminated text by responding to the text editing prompt words as text elimination prompt words, wherein the text elimination prompt words indicate quality description information aiming at the target image.
10. The method according to claim 1, wherein the method further comprises: Displaying a plurality of text regions identified in the image to be processed; and displaying a target image after the text in the target text region is edited in response to a selection operation and an editing operation for the target text region in the plurality of text regions.
11. The method of claim 2, wherein the text feature is output by a trained text feature extraction model, and wherein the text noise feature is output by a trained feature fusion model; Wherein the method further comprises: acquiring a first sample image containing text, a sample prompt word and a label image matched with the first sample image and the sample prompt word; Performing text recognition on the first sample image by using an initial text feature extraction model to obtain sample text features of the first sample image; performing position recognition on the text in the first sample image to obtain a sample region segmentation feature map of the first sample image; performing edge detection on the text in the first sample image to obtain a sample text edge feature map of the first sample image; fusing the sample text features, the sample region segmentation feature map and the sample text edge feature map by using an initial feature fusion model to obtain sample text noise features of the first sample image; Carrying out image denoising redrawing according to the sample prompting words and the sample text noise characteristics to generate a predicted image; and constructing a loss function according to the predicted image and the label image, and training the initial text feature extraction model and the initial feature fusion model by using the loss function to obtain a trained text feature extraction model and a trained feature fusion model.
12. An image processing apparatus, comprising: an acquisition unit for acquiring an image to be processed, the image to be processed comprises text; The noise characteristic generating unit is used for carrying out multidimensional recognition processing on the text in the image to be processed to obtain text noise characteristics of the image to be processed; the acquisition unit is also used for acquiring text editing prompt words; And the image generation unit is used for carrying out image denoising redrawing according to the text editing prompt word and the text noise characteristic, and generating the target image after the text is edited.
13. An electronic device, comprising: A processor; A memory for storing processor-executable instructions; Wherein the processor is configured to implement the steps of the method of any one of claims 1-11.
14. A non-transitory computer readable storage medium, which when executed by a processor of a mobile terminal, causes the mobile terminal to perform an image processing method, the method comprising: acquiring an image to be processed, wherein the image to be processed comprises a text; performing multidimensional recognition processing on the text in the image to be processed to obtain text noise characteristics of the image to be processed; acquiring text editing prompt words; And carrying out image denoising redrawing according to the text editing prompt word and the text noise characteristic, and generating the target image after the text is edited.

Description

Image processing method, device, equipment and storage medium Technical Field The present disclosure relates to the field of image processing, and in particular, to an image processing method, apparatus, device, and storage medium. Background With the rapid development of computer technology and image processing technology, the processing demands of images are increasingly diversified, and editing text in images is an important demand. The text in the image is edited, so that the working efficiency of users in various fields can be greatly improved, and more diversified and personalized image editing experience can be provided for the users. It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art. Disclosure of Invention The disclosure aims to provide an image processing method, an image processing device, image processing equipment and a storage medium. According to a first aspect of an embodiment of the present disclosure, an image processing method is provided, which includes obtaining an image to be processed, wherein the image to be processed includes text, performing multidimensional recognition processing on the text in the image to be processed to obtain text noise characteristics of the image to be processed, obtaining text editing prompt words, performing image denoising redrawing according to the text editing prompt words and the text noise characteristics, and generating a target image after the text is edited. In some embodiments, performing multidimensional recognition processing on text in the image to be processed to obtain text noise characteristics of the image to be processed, wherein the method comprises the steps of performing text recognition on the image to be processed to obtain text characteristics of the image to be processed, performing position recognition on the text in the image to be processed to obtain a region segmentation feature map of the image to be processed, performing edge detection on the text in the image to be processed to obtain a text edge feature map of the image to be processed, and fusing the text characteristics, the region segmentation feature map and the text edge feature map to obtain the text noise characteristics of the image to be processed. In some embodiments, image denoising redrawing is carried out according to the text editing prompt words and the text noise characteristics, and an edited target image of the text is generated, wherein the image generating model is obtained by adjusting and training an initial stable diffusion model, semantic understanding is carried out on the text editing prompt words through the image generating model, semantic characteristics of text editing and target text to be edited in the text are determined, processing is carried out through the image generating model on the basis of the text noise characteristics and the semantic characteristics, denoising redrawing is carried out on corresponding areas of the target text in the image to be processed, and the edited target image of the target text is generated. In some embodiments, the image processing method further comprises the steps of carrying out morphological detection on text in the image to be processed to obtain text morphological characteristics of the image to be processed, inputting the text morphological characteristics into the image generation model, and using the text morphological characteristics as guide information for denoising redrawing through the image generation model to generate a target image after the target text is edited. In some embodiments, performing morphological detection on text in the image to be processed to obtain text morphological characteristics of the image to be processed, wherein the morphological characteristics comprise performing text contour extraction on the text in the image to be processed to obtain text contour extraction results, expanding the text contour extraction results to obtain contour expansion results, and encoding the contour expansion results to obtain the text morphological characteristics. In some embodiments, the image generation model comprises an on-cloud replacement model, an end-side replacement model and an end-side elimination model, wherein the on-cloud replacement model is obtained by adjusting and training the initial stable diffusion model through a second sample image containing text, the end-side replacement model is obtained by compression coding and quantizing the on-cloud replacement model, and the end-side elimination model is obtained by distilling the initial stable diffusion model. In some embodiments, the image processing method further comprises calling a replacement model on the cloud or the end side replacement model to generate the target image in response to receivin