CN-121996781-A - Self-adaptive control method and system for large language model and computer storage medium

CN121996781ACN 121996781 ACN121996781 ACN 121996781ACN-121996781-A

Abstract

The application discloses a self-adaptive control method of a large language model, which comprises the steps of carrying out multi-modal feature extraction based on an input business dialogue to obtain multi-dimensional dialogue state features, inputting the multi-dimensional dialogue state features into a multi-layer perceptron, taking the multi-layer perceptron as a fusion network to output a control gain matrix, wherein the control gain matrix comprises gain weights corresponding to each dimension dialogue state feature, carrying out strategy mapping based on values of gain weight elements of the control gain matrix through a hierarchical mapping table, mapping each gain weight element into a corresponding natural language control instruction, injecting the natural language control instruction into a structured prompt word template to construct an enhanced prompt word, sending the enhanced prompt word to the large language model, and outputting a reply to the business dialogue through the large language model. The method can realize quantitative adjustment and self-adaptive control of the large language model generated answer, and improves the accuracy, controllability and adaptability of dialogue response in the vertical business scene.

Inventors

WANG SHUANG
ZHENG CHAO
QIN DONGMEI
ZHANG MENG
WANG WANG
CHEN CONGCONG
CHEN YAOJIA
CAI JIAN

Assignees

金卡智能集团股份有限公司
易联云计算(杭州)有限责任公司

Dates

Publication Date: 20260508
Application Date: 20260410

Claims (10)

1. An adaptive control method for a large language model, the adaptive control method comprising: based on the input business dialogue, multi-modal feature extraction is carried out to obtain multi-dimensional dialogue state features; Inputting the multidimensional dialogue state characteristics into a multi-layer perceptron, and taking the multi-layer perceptron as a fusion network to output a control gain matrix, wherein the control gain matrix comprises gain weights corresponding to each dimension dialogue state characteristic; Based on the value of each gain weight of the control gain matrix, performing strategy mapping through a hierarchical mapping table, and mapping each gain weight into a corresponding natural language control instruction; Injecting the natural language control instruction into a structured prompt word template to construct an enhanced prompt word comprising a system variable, a knowledge base search result and the natural language control instruction, wherein the system variable is a dynamic parameter set for recording and maintaining dialogue context and logic state in real time, and the knowledge base search result is obtained by searching the input business dialogue content in a knowledge base; and sending the enhanced prompt word and the input business dialogue to a large language model, and outputting a reply to the business dialogue through the large language model.
2. The adaptive control method of a large language model according to claim 1, characterized in that the adaptive control method further comprises: calculating the expected recovery standard of the round through dynamic weighted interpolation control between a preset ideal state and a preset bottom line state based on the control gain matrix; Acquiring an actual state vector, wherein the actual state vector is a state vector obtained by converting vectorized data through an evaluation coordinate system after carrying out data vectorization on a business dialogue reply output by the large language model, and the evaluation coordinate system is a three-dimensional space mapping matrix obtained by carrying out semantic dimension extraction on a history expert high-quality reply text through an orthogonal semantic decomposition technology; And calculating the deviation between the expected reply standard and the actual state vector, and adding a deviation correcting instruction into the enhanced prompt word of the next round of business dialogue under the condition that the deviation between the expected reply standard and the actual state vector is larger than a preset threshold value so as to enable the reply output by the large language model in the next round to be converged.
3. The method for adaptively controlling a large language model as set forth in claim 2, wherein, The self-adaptive control method further comprises training the multi-layer perceptron, and comprises the following steps: Constructing a training sample pair, wherein the training sample pair comprises a history expert high-quality reply text and a benchmark reply text generated by an original large language model under the same service dialogue context with the history expert high-quality reply text; mapping the history expert high-quality reply text and the reference reply text into a target state vector and an initial state vector respectively through the evaluation coordinate system; taking the initial state vector as a starting point, taking the control gain matrix as an adjustment proportion, and correcting the initial state vector to the target state vector to obtain a simulated correction state vector; And calculating an error between the simulation correction state vector and the target state vector, and updating the weight parameters of the multi-layer perceptron through a back propagation algorithm.
4. The method for adaptively controlling a large language model as set forth in claim 2, wherein, Extracting semantic dimensions of the high-quality reply text of the historical expert by an orthogonal semantic decomposition technology, wherein the obtaining of the evaluation coordinate system comprises the following steps: Using a Bert model as an encoder, and carrying out data vectorization on the history expert high-quality reply text; and adopting a principal component analysis algorithm to reduce the dimension of the high-quality recovery text of the history expert after data vectorization, searching three projection directions with the maximum data variance, and obtaining a three-dimensional space mapping matrix, wherein the three-dimensional space mapping matrix is used as the evaluation coordinate system.
5. The adaptive control method of a large language model according to claim 4, further comprising index definition of each axis of the evaluation coordinate system, comprising: Taking the history expert high-quality reply text as a sample, and calculating projection values of all the samples on each axis of the evaluation coordinate system respectively; respectively constructing feature sets aiming at all axes of the evaluation coordinate system, wherein each feature set comprises a preset number of samples with the largest projection value on the corresponding axis; And respectively calculating the high-frequency feature words with the occurrence frequency of N before ranking in each feature set, and calling a large language model to summarize according to the high-frequency feature words in each feature set to obtain index definitions corresponding to three dimensions of the evaluation coordinate system, wherein N is a natural number greater than or equal to 1.
6. The adaptive control method of a large language model according to claim 1, wherein the multidimensional dialogue state feature includes a business feature, the adaptive control method further comprising constructing a business feature extraction model by which the business feature is extracted, the constructing the business feature extraction model comprising: Converting a standard service flow chart into a topological graph, wherein the topological graph comprises a service node set and a jump logic set among nodes; executing random walk on the topological graph to simulate user operation behavior and generating a node sequence; Based on the node sequence, training by adopting a Skip-Gram algorithm to obtain an initial vector of each service node, so that adjacent service nodes on the topological graph have similar vector representation; and carrying out neighborhood information aggregation on the topological graph through a pre-trained graph neural network GCN by taking the initial vector as input, and outputting an enhanced service feature vector of each service node, thereby obtaining the service feature extraction model.
7. The adaptive control method of a large language model according to claim 1, wherein the multidimensional dialogue state feature includes a user feature, the adaptive control method further comprising extracting the user feature by: acquiring a DistilBERT language model of the universal corpus pre-training as a basic model; collecting a historical customer service dialogue log, wherein the historical customer service dialogue log comprises emotion tags and intention tags, the emotion tags at least comprise anger, anxiety or calm categories, and the intention tags represent user dialogue targets; And based on the history customer service dialogue log, performing supervision fine tuning on the DistilBERT language model, optimizing by adopting a cross entropy loss function in the fine tuning process, removing a classifier on the top layer of the DistilBERT language model after fine tuning is finished, and reserving the output of a last layer of a transducer encoder in the DistilBERT language model as the user characteristic of output, wherein the cross entropy loss function is expressed by the following formula: ; Wherein y label is a sample distribution of the emotion label and the intention label, P (y pred ) is a predicted probability distribution output by the DistilBERT language model, L user represents a loss value of the cross entropy loss function, and supervision fine tuning is performed on the DistilBERT language model with the aim of minimizing the loss value of the cross entropy loss function.
8. The adaptive control method of a large language model according to claim 1, wherein the multidimensional dialogue state feature includes a time series feature, the adaptive control method further comprising extracting the time series feature by: and generating a time sequence feature vector by adopting a sine position coding mode in a transducer architecture aiming at the current dialogue round and the preset maximum dialogue round in the multi-round business dialogue, thereby obtaining the time sequence feature for representing the time sequence position of the current dialogue round in the whole dialogue sequence.
9. An adaptive control system of a large language model, characterized in that the adaptive control system applies the adaptive control method of a large language model according to any one of claims 1 to 8.
10. A computer storage medium storing a processing program which, when executed, performs the method of adaptive control of a large language model according to any one of claims 1 to 8.

Description

Self-adaptive control method and system for large language model and computer storage medium Technical Field The present application relates to the field of large language model control, and in particular, to a method and system for adaptively controlling a large language model, and a computer storage medium. Background The prior art generally adopts a large language model and retrieval enhancement generation (RAG) architecture, and relies on static prompt words, a keyword-based rule engine or offline Reinforcement Learning (RLHF) to control large language model generation behaviors. However, the evaluation indexes in the prior art are subjective and mutually coupled, rely on manual subjective definition and have serious conflict among all dimensions (such as co-emotion and concentration), lack objective quantification standards capable of being decoupled mathematically, are static and stiff in control strategy, depend on static prompt words or fixed rules, cannot be adaptively adjusted according to real-time dynamic changes of user emotion and business flow, and finally lack an effective closed-loop correction mechanism, belong to an open-loop control mode, cannot sense deviation of model generation in real time and perform online self-correction, so that accurate, transparent and self-adaptive control of model generation behaviors is difficult to realize in complex and changeable vertical business scenes. Disclosure of Invention The embodiment of the application provides a self-adaptive control method and system for a large language model and a computer storage medium. In a first aspect, the present application provides a method for adaptively controlling a large language model, the method comprising: based on the input business dialogue, multi-modal feature extraction is carried out to obtain multi-dimensional dialogue state features; Inputting the multidimensional dialogue state characteristics into a multi-layer perceptron, and outputting a control gain matrix by using the multi-layer perceptron as a fusion network, wherein the control gain matrix comprises gain weights corresponding to each dimensional dialogue state characteristic; based on the values of the gain weight elements of the control gain matrix, performing strategy mapping through a hierarchical mapping table, and mapping each gain weight element into a corresponding natural language control instruction; Injecting natural language control instructions into a structured prompt word template, and constructing an enhanced prompt word comprising a system variable, a knowledge base search result and the natural language control instructions, wherein the system variable is a dynamic parameter set for recording and maintaining dialogue context and logic state in real time, and the knowledge base search result is obtained by searching the input business dialogue content in the knowledge base; and sending the enhanced prompt words and the inputted business dialogue to a large language model, and outputting a reply to the business dialogue through the large language model. In one embodiment, the adaptive control method further includes: Calculating the expected reply standard of the round through dynamic weighted interpolation control between a preset ideal state and a preset bottom line state based on the control gain matrix; The method comprises the steps of obtaining an actual state vector, wherein the actual state vector is a state vector obtained by converting vectorized data through an evaluation coordinate system after carrying out data vectorization on service dialogue replies output by a large language model, and the evaluation coordinate system is a three-dimensional space mapping matrix obtained by carrying out semantic dimension extraction on high-quality reply texts of historical experts through an orthogonal semantic decomposition technology; And calculating the deviation between the expected reply standard and the actual state vector, and adding a deviation correcting instruction to the enhanced prompt word of the next-round business dialogue under the condition that the deviation between the expected reply standard and the actual state vector is larger than a preset threshold value so as to enable the reply outputted by the large language model in the next round to converge. In one embodiment, the adaptive control method further includes training the multi-layer perceptron, including: constructing a training sample pair, wherein the training sample pair comprises a history expert high-quality reply text and a reference reply text generated by an original large language model under the same service dialogue context as the history expert high-quality reply text; Mapping the high-quality reply text of the history expert and the reference reply text into a target state vector and an initial state vector respectively through an evaluation coordinate system; Taking the initial state vector as a starting point, taking the control gain matri