CN-116994266-B - Word processing method, word processing device, electronic equipment and storage medium

CN116994266BCN 116994266 BCN116994266 BCN 116994266BCN-116994266-B

Abstract

The embodiment of the disclosure provides a word processing method, a word processing device, electronic equipment and a storage medium, wherein the method comprises the steps of obtaining a first image comprising words to be processed; and inputting the first image into the pre-trained target stroke order determining model to obtain the target stroke order corresponding to the word to be processed. According to the technical scheme, the positions and the sequences of all strokes of the characters can be accurately obtained, stroke breakage in the generated characters is greatly reduced, uneven stroke edges, and stroke loss or redundancy occur, and the accuracy of the generated characters is improved.

Inventors

LIU WEI
LIU FANGYUE

Assignees

北京字跳网络技术有限公司

Dates

Publication Date: 20260505
Application Date: 20220418

Claims (10)

1. A word processing method, comprising: acquiring a first image comprising characters to be processed; Training a target stroke order determination model by combining a spatial attention mechanism and a channel attention mechanism; inputting the first image into a pre-trained target stroke sequence determining model to obtain a target stroke sequence corresponding to the word to be processed; the method further comprises the steps of using the target stroke sequence determining model as a loss model of the style characteristic fusion model to be trained so as to train and obtain the target style characteristic fusion model; the target style feature fusion model is used for fusing at least two font styles, and is determined based on a first loss value obtained by stroke loss processing of the target stroke sequence determination model, reconstruction loss determination determined by a reconstruction loss function and style loss value determined by a style coding loss function.
2. The method as recited in claim 1, further comprising: Acquiring at least one first training sample, wherein the first training sample comprises a sample character image and a theoretical character stroke order corresponding to the sample character image; For each first training sample, inputting a sample text image in the current first training sample into a stroke order determining model to be trained to obtain a predicted stroke order; Determining a loss value based on the predicted stroke order and the theoretical text stroke order in the current first training sample, and correcting model parameters of the stroke order to be trained based on the loss value; and converging a loss function in the stroke order determining model to be trained as a training target to obtain the target stroke order determining model.
3. The method according to claim 2, wherein the inputting the sample text image in the current first training sample into the stroke order determination model to be trained to obtain the predicted stroke order includes: inputting the sample text image into a convolution layer to obtain a first feature to be processed; extracting the characteristics of the first to-be-processed characteristics through the channel attention mechanism and the space attention mechanism to obtain second to-be-processed characteristics; the second features to be processed are respectively sent to a circulating neural network unit to obtain a feature sequence corresponding to each stroke order position; and processing each characteristic sequence based on the classifier to obtain the predicted stroke order.
4. The method of claim 1, wherein training the target style feature fusion model comprises: determining at least one training sample, wherein the training sample comprises a character image to be trained and a reference character image; for each training sample, inputting a to-be-processed text image and a reference text image in the current training sample into a to-be-trained style feature fusion model to obtain an actual output text image corresponding to the to-be-processed text image; performing stroke loss processing on the actually output text image and the text image to be processed based on the target stroke order determining model to obtain a first loss value; Determining the actual output text image and the text image to be trained based on a reconstruction loss function, and determining reconstruction loss; Determining style loss values of the actually output text image and the fused text image based on a style coding loss function, wherein the fused text image is determined based on font styles of the text image to be trained and the reference text image; Correcting model parameters in the to-be-trained style feature fusion model based on the first loss value, the reconstruction loss and the style loss; and taking the convergence of the loss function in the to-be-trained style characteristic fusion model as a training target, and training to obtain a target style characteristic fusion model.
5. The method of claim 4, wherein the target style feature fusion model comprises a style feature extraction sub-model, a stroke feature extraction sub-model, a content extraction sub-model, and a coding sub-model; The style characteristic extraction sub-model is used for extracting a reference style of the reference text image; The stroke feature extraction sub-model is used for extracting stroke features of the characters to be processed; The content extraction sub-model is used for extracting the content characteristics of the characters to be trained, wherein the content characteristics comprise the character content and the character style to be processed; and the coding sub-model is used for coding the reference style, the stroke characteristics and the content characteristics to obtain an actual output text image.
6. The method as recited in claim 1, further comprising: receiving a target reference style text image and a target style conversion text image; outputting at least one display text image based on the text content and the converted text style of the target style converted text image and the reference text style of the target reference style text image to determine a target display text image based on a trigger operation.
7. The method as recited in claim 6, further comprising: And editing the characters in real time based on the target style characteristic fusion model corresponding to the target display character image, or generating a character packet corresponding to the target display character image.
8. A word processing apparatus, comprising: The first image acquisition module is used for acquiring a first image comprising characters to be processed; The stroke order determining model training module is used for training a target stroke order determining model by combining a spatial attention mechanism and a channel attention mechanism; the target stroke sequence determining module is used for inputting the first image into a pre-trained target stroke sequence determining model to obtain a target stroke sequence corresponding to the word to be processed; The device further comprises the step of taking the target stroke sequence determining model as a loss model of the style characteristic fusion model to be trained so as to train and obtain the target style characteristic fusion model; the target style feature fusion model is used for fusing at least two font styles, and is determined based on a first loss value obtained by stroke loss processing of the target stroke sequence determination model, reconstruction loss determination determined by a reconstruction loss function and style loss value determined by a style coding loss function.
9. An electronic device, the electronic device comprising: One or more processors; storage means for storing one or more programs, When executed by the one or more processors, causes the one or more processors to implement the word processing method of any of claims 1-7.
10. A storage medium containing computer executable instructions which, when executed by a computer processor, are for performing the word processing method of any of claims 1-7.

Description

Word processing method, word processing device, electronic equipment and storage medium Technical Field The embodiment of the disclosure relates to the technical field of artificial intelligence, in particular to a word processing method, a word processing device, electronic equipment and a storage medium. Background At present, related research for generating fonts by using artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) has been developed gradually, and in this way, not only is the requirements of users on various fonts satisfied, but also the production efficiency of designers is improved. When the related model is actually used for generating the characters, the existing style migration or picture translation technology is good at modifying the textures of the pictures, but is not good at modifying the structural information of the pictures, however, in the field of character generation, the inter-frame structure is just an important distinguishing point between fonts. Therefore, the fonts obtained based on the prior art often have more problems, such as broken strokes, uneven edges of the strokes, missing or redundant strokes and the like, which not only causes the automatically generated characters to be different from the characters expected by the user, but also has higher error rate. Disclosure of Invention The present disclosure provides a word processing method, a device, an electronic device, and a storage medium, which can accurately obtain the position and the sequence of each stroke of a word, greatly reduce the occurrence of the situations of stroke breakage, uneven stroke edge, and stroke loss or redundancy in the generated word, and improve the accuracy of the generated word. In a first aspect, an embodiment of the present disclosure provides a word processing method, including: acquiring a first image comprising characters to be processed; Training a target stroke order determination model by combining a spatial attention mechanism and a channel attention mechanism; And inputting the first image into a pre-trained target stroke order determining model to obtain a target stroke order corresponding to the word to be processed. In a second aspect, an embodiment of the present disclosure further provides a word processing apparatus, including: The first image acquisition module is used for acquiring a first image comprising characters to be processed; The stroke order determining model training module is used for training a target stroke order determining model by combining a spatial attention mechanism and a channel attention mechanism; and the target stroke sequence determining module is used for inputting the first image into a pre-trained target stroke sequence determining model to obtain a target stroke sequence corresponding to the word to be processed. In a third aspect, embodiments of the present disclosure further provide an electronic device, including: One or more processors; storage means for storing one or more programs, The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the word processing method as described in any of the embodiments of the present disclosure. In a fourth aspect, the presently disclosed embodiments also provide a storage medium containing computer-executable instructions that, when executed by a computer processor, are used to perform a word processing method as described in any of the presently disclosed embodiments. According to the technical scheme, the first image comprising the characters to be processed is firstly obtained, then the first image is input into the pre-trained target stroke sequence determining model comprising the spatial attention mechanism and the channel attention mechanism, so that the target stroke sequence corresponding to the characters to be processed is obtained, the positions and the sequences of all strokes of the characters can be accurately obtained by introducing the two mechanisms into the stroke sequence determining model, and therefore the conditions that the strokes are broken, the edges of the strokes are uneven, and the strokes are lost or redundant in the generated characters are greatly reduced, and the accuracy of the generated characters is improved. Drawings The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale. FIG. 1 is a schematic flow chart of a word processing method according to an embodiment of the disclosure; FIG. 2 is a schematic diagram of a stroke order determination model provided by an embodiment of the present disclosure; FIG.