CN-121981116-A - Large language model natural language processing method based on Fourier kernel attention
Abstract
The invention discloses a natural language processing method of a large language model based on Fourier kernel attention, which comprises the steps of preprocessing natural language sequence data to be processed to obtain normalized input word sequences, inputting the input word sequences into a large language model integrated with a Fourier kernel attention layer, projecting and dividing a plurality of attention heads through initialization input by adopting a Fourier kernel attention mechanism on the attention layer of the large language model integrated with the Fourier kernel attention layer, executing Fourier kernel attention calculation on each attention head, splicing through multiple heads of attention, and linearly projecting Integrating the multi-head Fourier kernel attention into the final multi-head Fourier kernel attention, inputting the multi-head Fourier kernel attention into a subsequent network layer for processing, and outputting the final natural language processing result. The method utilizes the improved large language model to realize high-efficiency calculation while maintaining semantic expression capability in a natural language processing scene.
Inventors
- WANG PENG
- ZHOU CONGHUA
- LIU ZHIFENG
- SHEN XIANGJUN
Assignees
- 江苏大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260126
Claims (9)
- 1. A large language model natural language processing method based on Fourier kernel attention is characterized by comprising the following steps: step 1, acquiring natural language sequence data to be processed, and preprocessing to obtain normalized input word sequences; Step 2, inputting the input word element sequence into a large language model integrated with a Fourier kernel attention layer; The large language model integrated with the Fourier kernel attention layer comprises a large language model attention layer and a subsequent network layer; the large language model attention layer adopts a Fourier kernel attention mechanism, projects and partitions a plurality of attention heads by initializing input, performs Fourier kernel attention calculation for each attention head, and projects linearly by multi-head attention splicing Integration into the final multi-head fourier kernel attention; and inputting the attention of the multi-head Fourier kernel into a subsequent network layer for processing, and outputting a final natural language processing result.
- 2. The method for natural language processing of large language model based on Fourier kernel attention as set forth in claim 1, wherein the method for constructing large language model integrated with Fourier kernel attention mechanism comprises replacing attention layer of large language model in large language model with Fourier kernel attention module based on large language model pre-trained by transducer architecture, converting attention calculation from time domain to frequency domain by Fourier kernel attention mechanism, and realizing two-dimensional fast Fourier transform and kernel function approximation by utilizing Fourier kernel attention mechanism Is used for the calculation of the complexity of the calculation of (a).
- 3. The large language model natural language processing method based on fourier kernel attention as recited in claim 1, wherein the data processing in step 2 is as follows: s2.1, initializing input projection by a system, and dividing a multi-head matrix; S2.2, carrying out Fourier kernel attention calculation on each attention head, wherein a nucleated attention weight is constructed in a frequency domain; S2.3, multi-head attention fusion is carried out, and multi-head Fourier kernel attention is obtained; s2.4, the multi-head Fourier kernel attention output is transmitted to a subsequent network layer of the large language model for processing, and a final natural language processing result is output.
- 4. The method for natural language processing of large language model based on Fourier kernel attention as set forth in claim 1, wherein the process of constructing the nucleated attention weight in the frequency domain in S2.2 is as follows, respectively constructing a cyclic matrix , Will be Into the frequency domain, wherein, , Respectively represent pairs of Performing Fourier transform; Calculating the nuclear attention weight in the frequency domain: Wherein, the method comprises the steps of, Representation pair Performing Fourier transform; is a representation of the kernel function in the frequency domain.
- 5. The method for natural language processing of large language model based on Fourier kernel attention as in claim 4, wherein the high-efficiency computation is performed by two-dimensional fast Fourier transform with the time complexity of fast Fourier transform being F is a Fourier matrix, noted as: wherein the Fourier matrix Is a complex matrix of n x n, n is the length of the signal or the number of sampling points, and the element of F is represented by a unit root Is a power of frequency construct for computing a Fourier transform to convert a time domain signal into a frequency domain representation Represents an inverse fourier transform.
- 6. The method for processing natural language of large language model based on attention of Fourier kernel as set forth in claim 4, wherein the kernel function is Gaussian kernel, and Gaussian kernel in frequency domain is: wherein ⨀ denotes an element-by-element multiplication, , , 、 Respectively representing the elements of the j-th row and the j-th column of the Q, K matrix, calculating For maintaining dimension consistency , , The element values are all 1; Is the conjugation of Q after Fourier transform; to fourier transform K; taking the characteristic dimension of the K matrix =64; Is that Represents an inverse fourier transform.
- 7. The method of claim 1, wherein the attention of each head is independently calculated by converting the attention weight back to the real number domain by inverse Fourier transform, and the attention of the frequency domain is required to be converted back to the real number domain by inverse Fourier transform because V is also in the real number domain, and the attention of the Fourier kernel is: to accelerate the calculation, first calculate Then and diagonal matrix Multiplication, namely: Wherein, the method comprises the steps of, ⨀ Denotes element-by-element multiplication, by virtue of the special nature of diagonal matrix multiplication, From the following components And (3) performing Fourier transform to obtain the product.
- 8. The method for processing natural language of large language model based on Fourier kernel attention as in claim 7, wherein the obtained h-head attention is spliced and projected linearly Integration into final multi-headed fourier kernel attention : Wherein, the method comprises the steps of, For each head calculated in step S2.2 , Also a learnable parameter.
- 9. A method of large language model natural language processing based on fourier kernel attention as recited in claim 1, wherein the preprocessing includes but is not limited to text cleansing, word segmentation, formatting, truncation, or chunking.
Description
Large language model natural language processing method based on Fourier kernel attention Technical Field The invention relates to the technical field of artificial intelligence and large language models, in particular to an attention mechanism for improving the processing efficiency of a long sequence of a large language model, which can be widely applied to natural language processing scenes such as an intelligent dialogue system, long document analysis, code generation and completion, knowledge question-answering and the like. Background With the wide application of large language models in the field of natural language processing, self-attention mechanisms show strong expressive power as their core components. However, the conventional self-attention mechanism has the problem that the time complexity increases with the square order of the length of the sequence, i.eWhere N is the sequence length and d is the feature dimension. This computational bottleneck severely limits the usefulness and extensibility of large language models in processing long documents, multiple rounds of conversations, code analysis, and the like, for long-sequence tasks. The current attention optimization method for the large language model mainly comprises linear attention, local attention, sparse attention and the like. Although these methods have improved computational efficiency, they tend to suffer from significant performance degradation at the expense of model accuracy, particularly when processing tasks that require long-range dependency understanding. Therefore, in natural language processing scenes such as an intelligent dialogue system, long-term document analysis, code generation and completion, knowledge question-answering and the like, how to realize efficient calculation while maintaining semantic expression capability becomes a key problem for the trend of a large language model to be practical. Disclosure of Invention In order to solve the defects existing in the prior art, the application provides a large language model natural language processing method based on Fourier kernel Attention, which firstly utilizes a Fourier kernel Attention mechanism (Fourier-Attention) to convert Attention calculation from a time domain to a frequency domain, utilizes two-dimensional fast Fourier transform (FFT 2) and kernel function approximation, and has time complexity which is similar to that of common self AttentionDown toTherefore, a large language model integrated with the Fourier kernel attention module is constructed, the problem of high computational complexity in the existing natural language processing scene can be solved by using the large language model, and good balance between accuracy and efficiency is achieved. The technical scheme adopted by the invention is as follows: A large language model natural language processing method based on Fourier kernel attention comprises the following steps: step 1, acquiring natural language sequence data to be processed, and preprocessing to obtain normalized input word sequences; Step 2, inputting the input word element sequence into a large language model integrated with a Fourier kernel attention layer; The large language model integrated with the Fourier kernel attention layer comprises a large language model attention layer and a subsequent network layer; the large language model attention layer adopts a Fourier kernel attention mechanism, projects and partitions a plurality of attention heads by initializing input, performs Fourier kernel attention calculation for each attention head, and projects linearly by multi-head attention splicing Integration into the final multi-head fourier kernel attention; and inputting the attention of the multi-head Fourier kernel into a subsequent network layer for processing, and outputting a final natural language processing result. Further, the method for constructing the large language model integrated with the Fourier kernel attention mechanism comprises the following steps: the method comprises the steps of replacing a large language model attention layer in the large language model with a Fourier kernel attention module based on a large language model pre-trained by a transducer architecture, converting attention calculation from a time domain to a frequency domain by utilizing a Fourier kernel attention mechanism, and realizing the goal of achieving the goal by utilizing two-dimensional fast Fourier transform and kernel function approximation Is used for the calculation of the complexity of the calculation of (a). Further, the data processing in step 2 is as follows: s2.1, initializing input projection by a system, and dividing a multi-head matrix; S2.2, carrying out Fourier kernel attention calculation on each attention head, wherein a nucleated attention weight is constructed in a frequency domain; S2.3, multi-head attention fusion is carried out, and multi-head Fourier kernel attention is obtained; s2.4, the multi-head Fourier ke