CN-117076601-B - Text processing method, device, electronic equipment and storage medium
Abstract
The embodiment of the disclosure provides a text processing method, a device, electronic equipment and a storage medium, which are used for obtaining a semantic feature space by obtaining a text to be processed and extracting features of the text to be processed, wherein the semantic feature space comprises semantic feature vectors of at least two text segments forming the text to be processed, the scene feature space corresponding to at least two text processing scenes is obtained, the scene feature space comprises semantic feature vectors of processing content keywords corresponding to at least two text processing scenes, and a target text processing scene is determined according to the spatial features of the semantic feature space and the spatial features of the at least two scene feature spaces, wherein the spatial features represent distribution features of the semantic feature vectors in the feature space. The semantic features corresponding to the text to be processed are converted into the scene feature space by converting the semantic features corresponding to the text to be processed into the semantic feature space, and the semantic features are compared in the feature space level, so that the text processing scene is accurately detected.
Inventors
- QIN YANGYANG
- DENG SONG
- WANG JINGWEI
Assignees
- 抖音视界有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20230804
Claims (13)
- 1. A text processing method, comprising: Acquiring a text to be processed, and extracting features of the text to be processed to obtain a semantic feature space, wherein the semantic feature space comprises semantic feature vectors of at least two text segments forming the text to be processed; acquiring scene feature spaces corresponding to at least two text processing scenes, wherein the scene feature spaces comprise semantic feature vectors of processing content keywords corresponding to at least two text processing scenes; The method comprises the steps of obtaining space features of a semantic feature space and space features of a scene feature space, wherein the space features of the semantic feature space represent distribution features of similarity of each semantic feature vector in the semantic feature space and semantic feature vectors in the scene feature space, and the space features of the scene feature space represent distribution features of similarity of each current semantic feature vector in the scene feature space and other semantic feature vectors; And carrying out overall mean value hypothesis test on the spatial features of the semantic feature space and the spatial features of the scene feature space to obtain a target text processing scene to which the spatial features of the semantic feature space belong under preset confidence.
- 2. The method according to claim 1, wherein a current semantic feature vector is a semantic feature vector sequentially acquired in the scene feature space, the other semantic feature vectors are semantic feature vectors in the scene feature space other than the current semantic feature vector, and the distribution feature is a distribution statistic result for a similarity corresponding to the current semantic feature vector.
- 3. The method according to claim 2, wherein the method further comprises: sequentially acquiring cosine similarity of each semantic feature vector and other semantic feature vectors in the scene feature space; Obtaining a similarity measurement matrix according to cosine similarity of each semantic feature vector and other semantic feature vectors; Dividing the similarity measurement matrix into two characteristic areas based on diagonal lines of the similarity measurement matrix; Counting all cosine similarities along a row or a column aiming at any characteristic region to obtain the distribution characteristics of the similarity among different cosine similarities corresponding to the similarity measurement matrix; And obtaining the spatial characteristics of the scene characteristic space according to the similarity measurement matrix.
- 4. The method according to claim 1, wherein the feature extraction of the text to be processed to obtain a semantic feature space includes: Sliding division is carried out on the text to be processed based on the target window length, so that at least two text segments are obtained; extracting features of the at least two text segments to obtain corresponding first semantic feature vectors; And obtaining the semantic feature space based on at least two first semantic feature vectors.
- 5. The method of claim 4, wherein the sliding dividing the text to be processed based on the target window length to obtain at least two text segments comprises: Obtaining a target overlapping rate according to the average distance of the text break sentence identifiers to be processed; And carrying out sliding division on the plain text corresponding to the text to be processed based on the target overlapping rate and the target window length to obtain at least two text segments, wherein the plain text is the text of the text to be processed after the sentence breaking mark is removed.
- 6. The method according to claim 4, wherein the text to be processed comprises main text and title text, and the title text is used for representing main content of the main text; the sliding division is performed on the text to be processed based on the target window length to obtain at least two text segments, and the sliding division comprises the following steps: Sliding division is carried out on the main text based on the target window length, so that at least two text segments are obtained; the obtaining the semantic feature space based on at least two first semantic feature vectors includes: acquiring a second semantic feature vector corresponding to the title text; And obtaining the semantic feature space based on at least two first semantic feature vectors corresponding to the main text and the second semantic feature vectors corresponding to the title text.
- 7. The method of claim 1, wherein obtaining text to be processed and extracting features from the text to be processed to obtain a semantic feature space comprises: generating abstract text corresponding to the text to be processed by utilizing a pre-trained generation model; and obtaining semantic feature vectors of text segments corresponding to the text to be processed according to the abstract text.
- 8. The method according to claim 1, wherein the obtaining scene feature spaces corresponding to at least two text processing scenes comprises: Acquiring expert experience data corresponding to at least two text processing scenes, wherein the expert experience data characterizes text content distribution rules aiming at the text processing scenes; Generating a third semantic feature vector representing the content keyword according to the expert experience data; and obtaining the scene feature space according to the third semantic feature vector.
- 9. The method as recited in claim 8, further comprising: obtaining annotation sample data corresponding to the text processing scene, wherein the annotation sample data comprises a sample text and annotation information corresponding to the sample text, and the annotation information is used for annotating processing content keywords in the sample text; Generating a fourth semantic feature vector according to the labeling sample data; the obtaining the scene feature space according to the third semantic feature vector includes: and obtaining the scene feature space based on the third semantic feature vector and the fourth semantic feature vector.
- 10. A text processing apparatus, comprising: The first extraction module is used for obtaining a text to be processed, extracting features of the text to be processed, and obtaining a semantic feature space, wherein the semantic feature space comprises semantic feature vectors of at least two text segments forming the text to be processed; The second extraction module is used for acquiring scene feature spaces corresponding to at least two text processing scenes, wherein the scene feature spaces comprise semantic feature vectors of processing content keywords corresponding to at least two text processing scenes; The processing module is used for obtaining spatial features of a semantic feature space and spatial features of a scene feature space, wherein the spatial features of the semantic feature space represent distribution features of similarity of each semantic feature vector in the semantic feature space and semantic feature vectors in the scene feature space, the spatial features of the scene feature space represent distribution features of similarity of each current semantic feature vector in the scene feature space and other semantic feature vectors, and overall mean value hypothesis test is conducted on the spatial features of the semantic feature space and the spatial features of the scene feature space to obtain a target text processing scene to which the spatial features of the semantic feature space belong under preset confidence.
- 11. An electronic device is characterized by comprising a processor and a memory; The memory stores computer-executable instructions; the processor executing computer-executable instructions stored in the memory, causing the processor to perform the text processing method of any one of claims 1 to 9.
- 12. A computer readable storage medium having stored therein computer executable instructions which, when executed by a processor, implement the text processing method of any of claims 1 to 9.
- 13. A computer program product comprising a computer program which, when executed by a processor, implements the text processing method according to any one of claims 1 to 9.
Description
Text processing method, device, electronic equipment and storage medium Technical Field The embodiment of the disclosure relates to the technical field of internet, in particular to a text processing method, a text processing device, electronic equipment and a storage medium. Background At present, along with popularization of the Internet and rapid development of digital technology, a large amount of text data is created, and various functions such as risk detection, accurate information pushing and the like can be realized by processing and utilizing the text data. In the prior art, the processing of text data is usually realized based on a pre-trained text processing model, however, when the text data to be processed needs to be processed based on the text processing scene to which the text data belongs and the corresponding text processing model is used for processing, the text processing scene can be determined after the text data to be processed is fully read and understood through manual experience, and then the matched text processing model and scheme are determined for processing the text, Therefore, in the scheme in the prior art, the problems of low text data processing efficiency and poor text effect are caused because the text processing scene cannot be identified. Disclosure of Invention The embodiment of the disclosure provides a text processing method, a text processing device, electronic equipment and a storage medium, so as to solve the problem that a text processing scene cannot be identified efficiently and accurately in the prior art. In a first aspect, an embodiment of the present disclosure provides a text processing method, including: The method comprises the steps of obtaining a text to be processed, extracting features of the text to be processed to obtain a semantic feature space, wherein the semantic feature space comprises semantic feature vectors of at least two text segments forming the text to be processed, obtaining scene feature spaces corresponding to at least two text processing scenes, the scene feature spaces comprise semantic feature vectors of processing content keywords corresponding to at least two text processing scenes, and determining a target text processing scene according to spatial features of the semantic feature space and spatial features of the at least two scene feature spaces, wherein the spatial features represent distribution features of the semantic feature vectors in the feature space. In a second aspect, an embodiment of the present disclosure provides a text processing apparatus, including: The first extraction module is used for obtaining a text to be processed, extracting features of the text to be processed, and obtaining a semantic feature space, wherein the semantic feature space comprises semantic feature vectors of at least two text segments forming the text to be processed; The second extraction module is used for acquiring scene feature spaces corresponding to at least two text processing scenes, wherein the scene feature spaces comprise semantic feature vectors of processing content keywords corresponding to at least two text processing scenes; And the processing module is used for determining a target text processing scene according to the spatial features of the semantic feature space and the spatial features of at least two scene feature spaces, wherein the spatial features represent the distribution features of the semantic feature vectors in the feature space. In a third aspect, an embodiment of the present disclosure provides an electronic device, including a processor and a memory; The memory stores computer-executable instructions; The processor executes computer-executable instructions stored in the memory to cause the at least one processor to perform the text processing method as described above in the first aspect and the various possible designs of the first aspect. In a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, implement the text processing method as described above in the first aspect and the various possible designs of the first aspect. In a fifth aspect, embodiments of the present disclosure provide a computer program product comprising a computer program which, when executed by a processor, implements the text processing method according to the first aspect and the various possible designs of the first aspect. According to the text processing method, the device, the electronic equipment and the storage medium, a text to be processed is obtained, feature extraction is carried out on the text to be processed, a semantic feature space is obtained, the semantic feature space comprises semantic feature vectors of at least two text segments forming the text to be processed, scene feature spaces corresponding to at least two text processing scenes are obtained, the scene feature space