CN-121980391-A - Representation vector optimization method, system, equipment and medium for sequence data prediction

CN121980391ACN 121980391 ACN121980391 ACN 121980391ACN-121980391-A

Abstract

The invention discloses a method, a system, equipment and a medium for optimizing a representation vector for predicting sequence data, and particularly relates to the technical field of representation vector optimization; the invention introduces hidden variables into each variable, encodes the hidden state of the original observation variable full of uncertainty, estimates the probability distribution obeyed by the hidden variables by using a neural network, takes the distributed parameters as the more robust expression vector of the variable, adaptively screens out the invariant feature codes which are most stable in association with a prediction target from a variable sequence by using a gating mechanism to reduce the uncertainty, and simultaneously continuously acquires the information gain provided by the node variable and the information variable in the information propagation process by using the asynchronous cyclic optimization strategy of the invention, thereby enhancing the characterization information of the node variable and the information variable under the information coupling.

Inventors

TAO RAN
WANG JINGYU
LI LING

Assignees

成都工业学院

Dates

Publication Date: 20260505
Application Date: 20260327

Claims (10)

1. A method of representation vector optimization for sequence data prediction, the method comprising: Step S1, an information diffusion process of information is obtained, hidden state coding is carried out on each node variable in the information diffusion process, and a node coding vector sequence is constructed; the method comprises the steps of obtaining a first logarithmic strength of an information diffusion process by combining a feed-forward network with an information invariant feature vector and an information variation feature vector, constructing a prediction function for predicting the future diffusion scale of information and the occurrence time of a future diffusion event by using the first logarithmic strength, constructing a first objective function by taking the minimum error of a prediction result of the prediction function as a target, and solving the first objective function by using an optimization algorithm to obtain an information representation vector after information optimization; step S2, acquiring a node evolution process of a node, carrying out hidden state coding on each information variable in the node evolution process, and constructing an information coding vector sequence; Constructing a prediction function for predicting the sending time of a future information forwarding event by utilizing the second logarithmic strength, constructing a second objective function by taking the minimum prediction result error of the prediction function as a target, and solving the second objective function by utilizing an optimization algorithm to obtain a node representation vector after node optimization; And step S3, circulating the steps S1-S2 until the first objective function and the second objective function are converged to obtain an information representation vector and a node representation vector.
2. A method for optimizing a representation vector for use in sequence data prediction according to claim 1, wherein after constructing the node encoded vector sequence, the method further comprises: Coding the information forwarding time of each node coding vector in the node coding vector sequence, adding the temporal coding obtained by coding with the corresponding node coding vector in the node coding vector sequence, and inputting the added vector into an attention mechanism to obtain a node historical strength code; And acquiring the node intensity code at the current moment of the node code vector sequence, and taking the node intensity code at the current moment and the node historical intensity code as information change feature vectors.
3. A method for optimizing a representation vector for use in prediction of sequence data according to claim 1, wherein after constructing the information encoded vector sequence, the method further comprises: encoding event occurrence time of each information coding vector in the information coding vector sequence, adding the time code obtained by encoding and the corresponding information coding vector in the information coding vector sequence, and inputting the added vector into an attention mechanism to obtain information historical intensity code; and acquiring a current time intensity code of the current time of the information coding vector sequence, and taking the current time information intensity code and the information history intensity code as node change feature vectors.
4. The method for optimizing a representation vector for predicting sequence data according to claim 1, wherein an information diffusion process of information is obtained, each node variable of the information diffusion process is subjected to hidden state coding, and a node coding vector sequence is constructed, specifically: Acquiring an information diffusion process of information, and calculating an influence factor of each node variable in an information cascade network in the information diffusion process; And calculating the node coding vector of each node by using the influence factors of each node and the node representation vector obtained by node optimization.
5. The method for optimizing the representation vector for predicting the sequence data according to claim 1, wherein the information invariant feature of each node coding vector is extracted by using the node representation vector obtained after node optimization in combination with a gating mechanism, so as to obtain the information invariant feature vector, which specifically comprises the following steps: The node expression vector obtained by node optimization is obtained, and the current information expression vector is encoded by utilizing an encoding function to obtain the encoded current information expression vector; By means of The function represents the vector and the node coding vector to the coded current information, and the vector weight of each node coding vector is obtained through calculation; And carrying out weighting processing on each node coding vector and the corresponding vector weight, and carrying out element-by-element multiplication calculation on all the vector features subjected to the weighting processing through Hadamard products to obtain an information invariant feature vector.
6. The method for optimizing the representation vector for predicting the sequence data according to claim 1, wherein the node invariant feature of each node encoding vector is extracted by using the information representation vector obtained by information optimization in combination with a gating mechanism to obtain the node invariant feature vector, and the method specifically comprises the following steps: Obtaining an information representation vector obtained by information optimization, and encoding the current node representation vector by utilizing an encoding function to obtain an encoded current node representation vector; By means of The function represents the vector and the information coding vector to the current node after coding, and the vector weight of each information coding vector is obtained through calculation; and carrying out weighting processing on each information coding vector and the corresponding vector weight, and carrying out element-by-element multiplication calculation on all vector features subjected to the weighting processing through Hadamard products to obtain a node invariant feature vector.
7. A representation vector optimization system for sequence data prediction, characterized in that the system is used in a representation vector optimization method for sequence data prediction according to any one of claims 1-6, said system comprising: The information representation vector optimization module is used for acquiring an information diffusion process of information, carrying out hidden state coding on each node variable in the information diffusion process, and constructing a node coding vector sequence; Constructing a prediction function for predicting the future diffusion scale of information and the occurrence time of a future diffusion event based on the first logarithmic strength, constructing a first objective function by taking the minimum error of the prediction result of the prediction function as a target, and solving the first objective function by using an optimization algorithm to obtain an information representation vector after information optimization; The node representation vector optimization module is used for acquiring a node evolution process of the node, carrying out hidden state coding on each information variable in the node evolution process, and constructing an information coding vector sequence; The method comprises the steps of obtaining a first logarithmic strength of a node evolution process by combining a feedforward network with a node invariant feature vector and a node change feature vector, constructing a prediction function for predicting the sending time of a future information forwarding event based on the first logarithmic strength, constructing a first objective function by taking the minimum prediction result error of the prediction function as a target, and solving the first objective function by using an optimization algorithm to obtain a node representation vector after node optimization.
8. A computer device comprising a system memory storing a computer program and a processor, wherein the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
10. A computer program product containing instructions which, when executed by a cluster of computer devices, cause the cluster of computer devices to perform the method of any of claims 1 to 6.

Description

Representation vector optimization method, system, equipment and medium for sequence data prediction Technical Field The invention relates to the technical field of representation vector optimization, in particular to a representation vector optimization method, a system, equipment and a medium for predicting sequence data. Background The sequence data prediction aims at predicting the future state or trend of the entity (such as social network users, financial products and sensors) according to a series of historical behavior data generated by the entity, and the technology is widely applied to key fields such as public opinion monitoring, recommendation systems and risk early warning. However, with the development of the internet of things technology, various sensor devices generate massive sequence data, so that the data scale is continuously increased, the characteristics of data such as remarkable non-stationarity, uncertainty enhancement, complex diffusion and the like are presented, and great challenges are brought to sequence data prediction. In an actual scene, the generation rule and the evolution rule of different sequence data are complex, meanwhile, the prior art has insufficient capturing capability of rich information contained in the sequence data under different dimensions, has the condition of information loss, limits the model prediction generalization capability, and changes the distribution characteristics of information variables and node variables when information is continuously transmitted, namely concept drift. Therefore, aiming at the coupling diffusion problem of heterogeneous variables and the uncertainty problem of a cascade sequence in an information cascade network, the information cascade sequence prediction has the following challenges of reducing the uncertainty of corresponding hidden states of the information variables and the node variables in the information diffusion point process and the node evolution point process so as to accurately code the information variables and the node variables, thereby improving the generalization capability of a prediction model. For example, in the prior art of deterministic neural network embedding methods, the essence is to generate a certain, point-estimated eigenvector for an entity, but the inherent uncertainty in real data (such as fluctuation of node influence, randomness of information popularity) cannot be characterized, while in the face of noisy data, sparse data or dynamically changing environments, the model treats noise and signals equally, resulting in a significant drop in generalization capability and robustness, in another prior art method, an aggregation mechanism (such as a simple average pooling or attention mechanism) is provided, which treats all historical elements in a sequence equally, but the screening criteria are "correlation" rather than "stability" or "causality", which results in the model learning a mixed and outdated statistical pattern, easily interfered by a changing noise pattern, and the core features associated with the existence of stable causality of a predicted target cannot be extracted, thus the long-term generalization capability is insufficient in a dynamic environment. Accordingly, the present invention is directed to a method, system, apparatus, and medium for representation vector optimization for sequence data prediction that address the above-referenced problems. Disclosure of Invention The invention aims to provide a representation vector optimization method, a system, equipment and a medium for predicting sequence data, which aims to provide a representation vector optimization method, a system and a medium for predicting sequence data by introducing hidden variables into each variable, simultaneously encoding the hidden state of an original observation variable which is full of uncertainty, estimating probability distribution obeyed by the hidden variables by using a neural network, taking parameters of the distribution as a representation vector with more robust variables, adaptively screening out the most stable invariant feature codes associated with a prediction target from a variable sequence by using a gating mechanism so as to reduce the uncertainty of the variable, and simultaneously continuously acquiring the information gain provided by the node variable and the information variable in the information transmission process by using an asynchronous cyclic optimization strategy, and enhancing the representation information of the node variable and the information variable under the information coupling. The invention is realized by the following technical scheme: a method of representation vector optimization for sequence data prediction, the method comprising: Step S1, an information diffusion process of information is obtained, hidden state coding is carried out on each node variable in the information diffusion process, and a node coding vector sequence is constructed; the