CN-120493938-B - Method for generating hidden state in cyclic neural network for language processing

CN120493938BCN 120493938 BCN120493938 BCN 120493938BCN-120493938-B

Abstract

The invention relates to the technical field of cyclic neural networks, in particular to a method for generating hidden states in a cyclic neural network for language processing, which comprises the following steps of S1, input embedding, namely converting discrete words into continuous vector representation, providing a computable semantic basis for a model, enabling abstract meaning of the words to be converted into a numerical form, constructing an embedding layer through pre-training or random initialization, mapping each word to a fixed dimension vector space, capturing semantic association through a large-scale corpus statistics rule by pre-training embedding, determining importance of each historical position through comparing a query vector with key vectors generated by all historical hidden states, specifically, converting each historical hidden state into a key vector and a value vector, calculating dot product score of the query vector and each, and obtaining attention weight after scaling and normalization. These weights reflect the strength of association of the current word with the respective historical locations.

Inventors

WU WENMIN
LI JIA
QI LEI
ZHANG YU
WANG ZHE
LIU YI
YAN XIYU
WANG SHUO

Assignees

哈尔滨商业大学

Dates

Publication Date: 20260508
Application Date: 20250429

Claims (7)

1. A method for generating hidden states in a recurrent neural network for language processing, comprising the steps of: S1, input embedding, namely converting discrete words into continuous vector representation, providing a computable semantic basis for a model, converting word abstract meanings into numerical values, constructing an embedding layer through pre-training or random initialization, mapping each word into a fixed dimension vector space, capturing semantic association by the pre-training embedding through a large-scale corpus statistical rule, and gradually adjusting the embedding vector dimension through task data training, wherein the expression capacity and the calculation cost are required to be weighed by the random initialization embedding, and converting an original text into a high dimension value sequence suitable for neural network processing; s2, generating a current query, namely constructing a 'search focus' vector serving as a starting point of an attention mechanism, constructing a hidden state of the query vector depending on current time step input information or the previous moment, projecting the input vector or the hidden state into a new vector space to form the query through linear transformation, realizing the query by a learnable weight matrix, optimizing matrix parameters in training, generating a query vector containing the local characteristics of the current vocabulary, and possibly inheriting the accumulated information of the historical hidden state; S3, calculating attention weight, namely, calculating attention weight is a key link of dynamic screening historical information, comparing a current query vector with key vectors of all historical positions by a model, quantifying the importance of historical elements, generating the key vectors by linear transformation of historical hidden states, measuring similarity of attention scores by dot product operation, scaling, converting the similarity into probability distribution by a normalization function, and directly establishing global association in the calculation process without depending on sequence or distance; S4, generating a context vector, wherein the context vector is an attention mechanism output result, historical information is subjected to weighted fusion, attention weights are multiplied by corresponding value vectors and summed, the value vectors are obtained by linear transformation of historical hidden states, the context vector is generated selectively, and the characteristics highly related to the current query are reserved; s5, updating hidden states, wherein the hidden state updating is core operation of the cyclic neural network, current input, historical state and context information are fused through nonlinear transformation, after an attention mechanism is introduced, context vectors are used as additional input, three parts of information are linearly combined through independent weight matrixes, and then the three parts of information are compressed to a reasonable range through an activation function; and S6, iterative transfer, namely, the iterative transfer is a core mechanism for processing sequence data by a cyclic neural network, the updated hidden state is used as an initial state of the next time step, the subsequent attention calculation and state update are continued to be participated, for long sequence processing, the calculation efficiency can be optimized by limiting the length of a historical window or adopting a sparse attention strategy, the hidden state is transferred to form information longitudinal flow, and the attention mechanism enhances the stride-crossing information interaction.
2. The method of claim 1, wherein the method of generating hidden states in a recurrent neural network for language processing requires selection of an embedding method prior to input embedding, and utilizes word vectors trained by corpus to provide semantic prior knowledge.
3. A method for generating hidden states in a recurrent neural network for language processing as claimed in claim 1, comprising the steps of: s21, based on the previous hidden state: Generating a query by utilizing the hidden state of the previous time step, and capturing progressive logic of the sequence; s22, based on the current input: embedding the current word as a query, emphasizing the currently input scene, Is the first A query vector for each time step; S23: Is a learnable matrix.
4. A method for generating hidden states in a recurrent neural network for language processing as claimed in claim 1, comprising the steps of: S31 key vector : Mapping the historical hidden state to a key space for matching inquiry; S32 value vector : Mapping the historical hidden state to a value space for information aggregation; S33: Respectively controlling semantic projection directions of keys and values; The formula of the attention score calculation is as follows: ; Wherein, the Is a scaling factor; the formula normalized to probability distribution is: 。
5. A method of generating hidden states in a recurrent neural network for language processing as claimed in claim 1, wherein a comprehensive context vector is obtained by weighted summing the attention weights with the corresponding value vectors.
6. The method for generating hidden states in a recurrent neural network for language processing as claimed in claim 1, wherein in the hidden state update formula of the standard RNN, a context vector is added in addition to the current input and the previous hidden state, and the new hidden state fuses the three pieces of information through a nonlinear activation function.
7. A method of generating hidden states in a recurrent neural network for language processing as claimed in claim 1, wherein the newly generated hidden states are passed to the next time step and participate in subsequent attention calculations as part of the history information.

Description

Method for generating hidden state in cyclic neural network for language processing Technical Field The invention relates to the technical field of cyclic neural networks, in particular to a method for generating hidden states in a cyclic neural network for language processing. Background In a cyclic neural network for language processing, standard RNN hidden state generation is a key process, which processes input in time sequence based on characteristics of sequence data, at each time step, the network combines current input information and hidden states at the previous moment, generates hidden states at the current moment through a nonlinear transformation process, and initially, the hidden states are usually set to a specific initial value, such as an all-zero vector or a random vector, and with the continuous advancement of an input sequence, the hidden states are continuously updated, and all input information before the input states are gradually accumulated and fused, so that long-term dependency relations in the sequence can be captured, and effective feature representations are provided for subsequent language processing tasks, such as text generation, emotion analysis and the like. The understanding of semantics by models is mainly based on learning word vectors and sequence information, for words with high flexibility and context dependency, the semantic understanding is often inaccurate, the meaning of the reference of "this" is completely depends on the context, and the model may simply infer the reference according to the co-occurrence relation or local context of the words, so that it is difficult to deeply understand the semantic logic and grammar structure of the text to determine the accurate reference object. Therefore, the invention provides a method for generating hidden states in a cyclic neural network for language processing, which is used for determining the importance of each historical position by comparing a query vector with key vectors generated by all historical hidden states, specifically, each historical hidden state is converted into a key vector and a value vector, then the dot product score of the query vector and each is calculated, and the attention weight is obtained after scaling and normalization. These weights reflect the strength of association of the current word with the various historical locations, and in sentence "The government implemented policies to reduce carbon emissions.This initiative received widespread support.", when processing "This", the model gives a higher score to the key vector corresponding to "policies" and thus generates a greater attention weight. Disclosure of Invention The semantic understanding of the word is often inaccurate, the meaning of the reference of the word is completely dependent on the context, and the model can simply infer the reference according to the co-occurrence relation or the local context of the word, so that the semantic logic and the grammar structure of the text are difficult to understand deeply to determine the accurate reference object. Aiming at the defects of the prior art, the invention provides a method for generating hidden states in a cyclic neural network for language processing, and further solves the technical problems in the background art. In order to achieve the above purpose, the invention is realized by the following technical scheme: A method for generating hidden states in a recurrent neural network for language processing, comprising the steps of: S1, input embedding, namely converting discrete words into continuous vector representation, providing a computable semantic basis for a model, converting word abstract meanings into numerical values, constructing an embedding layer through pre-training or random initialization, mapping each word into a fixed dimension vector space, capturing semantic association by the pre-training embedding through a large-scale corpus statistical rule, and gradually adjusting the embedding vector dimension through task data training, wherein the expression capacity and the calculation cost are required to be weighed by the random initialization embedding, and converting an original text into a high dimension value sequence suitable for neural network processing; s2, generating a current query, namely constructing a 'search focus' vector serving as a starting point of an attention mechanism, constructing a hidden state of the query vector depending on current time step input information or the previous moment, projecting the input vector or the hidden state into a new vector space to form the query through linear transformation, realizing the query by a learnable weight matrix, optimizing matrix parameters in training, generating a query vector containing the local characteristics of the current vocabulary, and possibly inheriting the accumulated information of the historical hidden state; S3, calculating attention weight, namely, calculating attention weight is a