CN-122020376-A - Entity state identification method, equipment and medium based on multi-mode time sequence fusion

CN122020376ACN 122020376 ACN122020376 ACN 122020376ACN-122020376-A

Abstract

The application relates to the technical field of electric digital data processing, in particular to a method, equipment and medium for identifying an entity state based on multi-mode time sequence fusion. The method comprises the steps of obtaining a description text of a target entity and numerical value type state index time sequence data of the target entity, encoding the description text of the target entity by using a pre-training language model to generate a text semantic feature vector, splicing the text semantic feature vector and the numerical value type state index time sequence data to obtain a multi-mode time sequence feature sequence, inputting the multi-mode time sequence feature sequence into a time sequence convolution network, carrying out gating attention fusion on time sequence fusion features output by the time sequence convolution network and the text semantic feature vector to generate a comprehensive state feature vector, and outputting state grade classification and state trend prediction results of the target entity based on the comprehensive state feature vector. The application can improve the accuracy of entity state identification.

Inventors

ZHANG JINGYA
HU JIACHEN
DUAN LIGE
HAN JIANNING

Assignees

杭州云信智策科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260127

Claims (10)

1. An entity state identification method based on multi-mode time sequence fusion is characterized by comprising the following steps: Acquiring a description text of a target entity and numerical state index time sequence data of the target entity; encoding the description text of the target entity by using a pre-training language model to generate a text semantic feature vector; Splicing the text semantic feature vector with the numerical state index time sequence data to obtain a multi-mode time sequence feature sequence; Inputting the multi-mode time sequence feature sequence into a time sequence convolution network, and carrying out gating attention fusion on time sequence fusion features output by the time sequence convolution network and the text semantic feature vector to generate a comprehensive state feature vector; and outputting the state grade classification and state trend prediction result of the target entity based on the comprehensive state feature vector.
2. The method for identifying the entity state based on multi-modal time sequence fusion according to claim 1, wherein the time sequence convolution network comprises at least two parallel branches adopting different expansion coefficient combinations, and the time sequence fusion characteristics of the time sequence convolution network output comprise the time sequence fusion characteristics of the output of each parallel branch.
3. The method for identifying entity states based on multi-modal temporal fusion according to claim 2, wherein the temporal convolution network comprises a first branch and a second branch, wherein the geometric mean of the expansion coefficients of the first branch is greater than the geometric mean of the expansion coefficients of the second branch by more than a preset multiple.
4. The method for identifying an entity state based on multi-modal temporal fusion according to claim 2, wherein the performing gated attention fusion on the temporal fusion feature output by the temporal convolution network and the text semantic feature vector to generate a comprehensive state feature vector includes: Respectively carrying out time sequence global pooling on time sequence fusion characteristics output by each parallel branch to obtain pooling characteristics of each branch; Splicing the pooling features of each branch with the text semantic feature vector to obtain features to be fused; And obtaining the weight of each component in the feature to be fused by using a gating attention mechanism, carrying out weighted summation on each component according to the weight, and outputting the comprehensive state feature vector.
5. The method for identifying an entity state based on multi-modal temporal fusion according to claim 1, wherein the outputting the state rank classification and state trend prediction result of the target entity based on the integrated state feature vector comprises: The comprehensive state feature vector is input into a multi-task learning framework, wherein the multi-task learning framework comprises a first full-connection layer, a first activation function, a second full-connection layer and a second activation function; Outputting the classification probability distribution of the current risk level through the first full-connection layer and the first activation function; And outputting the predicted probability of the risk level improvement in the future appointed time window through the second full-connection layer and the second activation function.
6. The method for identifying an entity state based on multi-modal temporal fusion according to claim 1, wherein the splicing the text semantic feature vector with the numerical state index temporal data to obtain a multi-modal temporal feature sequence includes: copying and expanding the text semantic feature vector to enable the text semantic feature vector to be aligned with the numerical state index time sequence data in the time dimension; Splicing the copied and expanded text semantic feature vector with numerical state index data of corresponding time to form multi-mode features of each time; the multi-modal timing feature sequence is formed from multi-modal features at all times.
7. The method for recognizing entity state based on multi-modal time series fusion according to claim 4, wherein after obtaining the pooling feature of each branch and before splicing the pooling feature of each branch with the text semantic feature vector, the method further comprises projecting the pooling feature of each parallel branch output and the text semantic feature vector to a unified feature dimension by using a linear transformation layer; And splicing the pooling features of each branch with the text semantic feature vector to obtain the feature to be fused, wherein the step of splicing the projected features corresponding to the pooling features of each branch with the projected features corresponding to the text semantic feature vector to obtain the feature to be fused.
8. The method for identifying entity state based on multi-modal temporal fusion according to claim 1, wherein the entity is a financial application and the pre-trained language model is FinBERT.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the multi-modal time-sequential fusion-based entity state identification method of any one of claims 1 to 8 when the computer program is executed by the processor.
10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the multi-modal time-series fusion-based entity state identification method according to any one of claims 1 to 8.

Description

Entity state identification method, equipment and medium based on multi-mode time sequence fusion Technical Field The invention relates to the technical field of electric digital data processing, in particular to a method, equipment and medium for identifying an entity state based on multi-mode time sequence fusion. Background In the present digital management, the accurate and timely identification and prediction of the states of various entities (such as financial application programs, software systems and the like) are key to realizing risk prevention and control. The status information of such entities is usually contained in two different types of data, namely, one is text describing the inherent attribute or function of the entity (such as application description, function introduction, etc.), and the other is numerical status index data (such as interface call failure times, operation frequency times, etc.) which is generated in the running process of the entity and recorded in time sequence. Currently, entity state recognition methods mainly rely solely on the above-mentioned descriptive text or the above-mentioned numerical state index data. For example, for a financial risk recognition task, a domain-specific pre-training model (such as FinBERT) with a relatively small parameter scale is adopted to recognize the risk state of a description text of a financial application program, and compared with a pre-training large language model based on a transducer architecture, the domain-specific pre-training model can capture risk semantics in the text more accurately after fine tuning on a financial corpus, but the method does not consider time sequence data generated by the financial application program in the running process, and can not capture the risk state change of the financial application program in time based on dynamic data of an entity. Or adopting a model such as a cyclic neural network (RNN), a long-short-term memory network (LSTM) or a standard time sequence convolution network (TCN) to analyze the numerical state index data, the method can capture time sequence dynamic state, but ignores text description information of an entity, so that inherent relation between inherent attribute and current behavior of the entity cannot be understood. The method for identifying the entity state by independently relying on the descriptive text or the numerical state index data has the defect of inaccurate identification of the entity state, and how to improve the accuracy of identification of the entity state is a problem to be solved. Disclosure of Invention The invention aims to provide a method, equipment and medium for identifying an entity state based on multi-mode time sequence fusion so as to improve the accuracy of identifying the entity state. According to a first aspect of the present invention, there is provided a method for identifying an entity state based on multi-modal timing fusion, the method comprising the steps of: and acquiring the description text of the target entity and the numerical state index time sequence data of the target entity. And encoding the descriptive text of the target entity by using the pre-training language model to generate a text semantic feature vector. And splicing the text semantic feature vector with the numerical state index time sequence data to obtain a multi-mode time sequence feature sequence. And inputting the multi-mode time sequence feature sequence into a time sequence convolution network, and carrying out gating attention fusion on the time sequence fusion feature output by the time sequence convolution network and the text semantic feature vector to generate a comprehensive state feature vector. And outputting the state grade classification and state trend prediction result of the target entity based on the comprehensive state feature vector. Further, the time sequence convolution network comprises at least two parallel branches adopting different expansion coefficient combinations, and the time sequence fusion characteristic of the time sequence convolution network output comprises the time sequence fusion characteristic of each parallel branch output. Further, the time sequence convolution network comprises a first branch and a second branch, wherein the geometric mean value of the expansion coefficient of the first branch is larger than the geometric mean value of the expansion coefficient of the second branch by more than a preset multiple. Further, the performing gated attention fusion on the sequential fusion feature output by the sequential convolution network and the text semantic feature vector, and generating a comprehensive state feature vector includes: And respectively carrying out time sequence global pooling on the time sequence fusion characteristics output by each parallel branch to obtain pooling characteristics of each branch. And splicing the pooling features of each branch with the text semantic feature vector to obtain the fe