CN-121980027-A - Text processing method, apparatus, device, medium, and program product
Abstract
The application provides a text processing method which can be applied to the technical fields of big data and artificial intelligence. The text processing method comprises the steps of obtaining various style characteristics of a text to be processed under different style dimensions of the text, splicing the various style characteristics of the text to be processed to obtain total style characteristics of the text to be processed, identifying semantic characteristics of the text to be processed, carrying out self-attention fusion on the semantic characteristics and the total style characteristics of the text to be processed to obtain fusion characteristics of the text to be processed, determining text types and priorities of the text to be processed based on the fusion characteristics, associating the priorities with the text types, and processing the text to be processed according to processing procedures corresponding to the text types and processing grades indicated by the priorities. The application also provides a text processing device, equipment, a storage medium and a program product.
Inventors
- TI JINGMIAO
Assignees
- 中国工商银行股份有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20250630
Claims (12)
- 1. A method of text processing, the method comprising: acquiring a plurality of style characteristics of a text to be processed under different style dimensions of the text, and splicing the style characteristics of the plurality of styles of the text to be processed to obtain total style characteristics of the text to be processed; identifying semantic features of the text to be processed, and performing self-attention fusion on the semantic features and the total genre style features to obtain fusion features of the text to be processed; Determining a text category and a priority of the text to be processed based on the fusion feature, wherein the priority is associated with the text category; and processing the text to be processed according to the corresponding processing flow of the text category and the processing grade indicated by the priority.
- 2. The text processing method according to claim 1, wherein the splicing the plurality of genre style features to obtain the total genre style feature of the text to be processed includes: Acquiring authorization of a user for extracting identity information, extracting the identity information after the authorization is acquired, and encoding the identity information to acquire the identity characteristics of the user, wherein the text to be processed is provided by the user; And splicing the identity features and the splicing features of the plurality of style features of the cultural relics to obtain the total text style feature, wherein the priority is associated with the identity features.
- 3. The text processing method according to claim 2, wherein the splicing the plurality of genre styles features to obtain the total genre style feature of the text to be processed further includes: Determining the repeated reading behavior characteristics of the text to be processed according to the times of repeatedly providing the text to be processed by the user in a preset time window; And splicing the repeated reading behavior characteristic, the identity characteristic and the splicing characteristic of the plurality of style characteristics of the cultural relics to obtain the total text style characteristic, wherein the priority is associated with the repeated reading behavior characteristic.
- 4. A text processing method according to any one of claims 1 to 3, wherein the obtaining a plurality of style characteristics of a text to be processed in different style dimensions of the text to be processed, and the splicing the plurality of style characteristics of the text includes: And acquiring the style characteristics of the text to be processed in at least two dimensions of formality dimension, emotion tendency dimension, sentence complexity dimension and vocabulary selection dimension, and splicing the style characteristics of the text in the at least two dimensions to obtain the spliced characteristics of the style characteristics of the various text.
- 5. The method of claim 4, wherein the obtaining the genre-style characteristics of the text to be processed in at least two dimensions of a formality dimension, an emotion tendencies dimension, a sentence complexity dimension, and a vocabulary selection dimension comprises: Determining the formality score of each word of the text to be processed, and determining the formality feature corresponding to the text to be processed in the formality dimension according to the formality score of each word; determining the emotion polarity of the text to be processed, and encoding the emotion polarity of the text to be processed to obtain emotion polarity characteristics corresponding to the text to be processed in the emotion tendency dimension; Determining average length, dependency tree depth and clause ratio of sentences of the text to be processed according to the dependency relationship between each sentence of the text to be processed, and determining the syntax complexity characteristic corresponding to the text to be processed in the sentence complexity dimension according to the average length, the dependency tree depth and the clause ratio; And determining the corresponding word richness characteristics of the text to be processed in the word selection dimension according to the different word numbers and the total word data in the text to be processed.
- 6. The text processing method of claim 1, wherein the identifying semantic features of the text to be processed comprises: Performing global coding on the text to be processed, and determining global semantic representation and sentence level semantic representation of the text to be processed according to a coding result; and splicing the global semantic representation and the sentence-level semantic representation to obtain the semantic features of the text to be processed.
- 7. The text processing method of claim 1, wherein the determining the text category and priority of the text to be processed based on the fusion feature comprises: inputting the fusion features into a trained model, wherein the model comprises a first network layer for text classification and a second network layer for priority prediction; determining a probability value of each preset category of the text to be processed through the first network layer, and determining the text category according to the probability value; and determining a predicted value of the priority of the text to be processed through the second network layer, and rounding the predicted value to obtain the priority.
- 8. The text processing method according to claim 1, characterized in that the method further comprises: and before the various style characteristics of the text to be processed are acquired, cleaning the data of the text to be processed.
- 9. A text classification device, the device comprising: the feature splicing module is used for acquiring various style features of the text to be processed under different style dimensions of the text, and splicing the various style features of the text to be processed to obtain total style features of the text to be processed; The feature fusion module is used for identifying semantic features of the text to be processed, and carrying out self-attention fusion on the semantic features and the total genre style features to obtain fusion features of the text to be processed; The text classification module is used for determining the text category and the priority of the text to be processed based on the fusion characteristics, and the priority is associated with the text category; and the text processing module is used for processing the text to be processed according to the corresponding processing flow of the text category and the processing grade indicated by the priority.
- 10. An electronic device, comprising: One or more processors; a memory for storing one or more computer programs, Characterized in that the one or more processors execute the one or more computer programs to implement the steps of the method according to any one of claims 1-8.
- 11. A computer-readable storage medium, on which a computer program or instructions is stored, which, when executed by a processor, carries out the steps of the method according to any one of claims 1-8.
- 12. A computer program product comprising a computer program or instructions which, when executed by a processor, implement the steps of the method according to any one of claims 1 to 8.
Description
Text processing method, apparatus, device, medium, and program product Technical Field The present application relates to the technical field of big data and artificial intelligence, in particular to the technical field of natural language processing, and more particularly to a text processing method, apparatus, device, medium and program product. Background With the rapid development of the Internet and social media, the number of user messages received by various online service platforms is increased explosively. How to quickly and accurately classify the messages so as to transfer the messages to the responsible hands in the corresponding fields according to the types becomes a key problem for improving the service efficiency and the user satisfaction. Disclosure of Invention In view of the foregoing, the present application provides a text processing method, apparatus, device, medium, and program product that improve the accuracy and processing efficiency of text classification. According to the first aspect of the application, a text processing method is provided, which comprises the steps of obtaining various style characteristics of a text to be processed under different style dimensions of the text, splicing the style characteristics of the various styles to obtain total style characteristics of the text to be processed, identifying semantic characteristics of the text to be processed, carrying out self-attention fusion on the semantic characteristics and the total style characteristics to obtain fusion characteristics of the text to be processed, determining text types and priorities of the text to be processed based on the fusion characteristics, associating the priorities with the text types, and processing the text to be processed according to processing procedures corresponding to the text types and processing grades indicated by the priorities. According to the embodiment of the application, the method for obtaining the total text style characteristics of the text to be processed by splicing the various text style characteristics comprises the steps of obtaining authorization of a user for extracting identity information, extracting the identity information after the authorization is obtained, encoding the identity information to obtain the identity characteristics of the user, and splicing the identity characteristics and the splicing characteristics of the various text style characteristics to obtain the total text style characteristics, wherein the priority is associated with the identity characteristics. According to the embodiment of the application, the method for obtaining the total text style characteristics of the text to be processed by splicing the various text style characteristics further comprises the steps of determining the repeated reading behavior characteristics of the text to be processed according to the times that the user repeatedly provides the text to be processed in a preset time window, and splicing the repeated reading behavior characteristics, the identity characteristics and the splicing characteristics of the various text style characteristics to obtain the total text style characteristics, wherein the priority is related to the repeated reading behavior characteristics. According to the embodiment of the application, the method for acquiring the various types of the text to be processed under different text style dimensions comprises the steps of acquiring the text style features of the text to be processed under at least two dimensions of formality dimensions, emotion tendency dimensions, sentence complexity dimensions and vocabulary selection dimensions, and splicing the text style features under at least two dimensions to obtain the spliced features of the various types of the text style features. According to the embodiment of the application, the method for obtaining the style characteristics of the text to be processed in at least two dimensions of formality dimension, emotion tendency dimension, sentence complexity dimension and vocabulary selection dimension comprises the steps of determining formality score of each word of the text to be processed, determining formality characteristics corresponding to the formality dimension of the text to be processed according to the formality score of each word, determining emotion polarity of the text to be processed, encoding emotion polarity of the text to be processed to obtain emotion polarity characteristics corresponding to the emotion tendency dimension, determining average length, dependency tree depth and clause ratio of sentences of the text to be processed according to dependency relation among sentences of the text to be processed, determining syntax complexity characteristics corresponding to the sentence complexity dimension of the text to be processed according to the average length, dependency tree depth and clause ratio, and determining vocabulary richness characteristics corresponding to the vocab