CN-122021663-A - Training method and device for intention discrimination model and computer equipment

CN122021663ACN 122021663 ACN122021663 ACN 122021663ACN-122021663-A

Abstract

The invention discloses an intention judgment model training method, an intention judgment model training device and computer equipment. The method comprises the steps of screening dialogue samples, extracting key variable construction instruction fine tuning data sets, improving model architecture, training the improved model architecture to obtain a role playing model, conducting anti-facts dialogue generation by utilizing the role playing model based on the modified key variables to obtain dialogue contents, inputting the dialogue contents into an intention judging model, mining low-confidence samples and boundary confusion samples, and feeding variant data of the mined samples back to the training process of the intention judging model to establish dynamic game relations between the role model and the intention judging model. By implementing the method, more diversified and challenging training data are generated through a dynamic game mechanism and intervention on key variables, rare scene and intention boundary samples are systematically expanded, and the performance and reliability of the intention distinguishing model in practical application are remarkably improved.

Inventors

HUANG LINING
GAO PENG
XU XINGBIAO
YUAN LAN
Che Caide
QIAN LEI

Assignees

杭州摸象大数据科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260410

Claims (10)

1. The training method of the intention judgment model is characterized by comprising the following steps: Screening dialogue samples, extracting key variables, constructing an instruction fine tuning data set, improving a model architecture to support different dialogue paths generated according to externally specified variables, and training the improved model architecture according to the data set to obtain a role playing model; Performing anti-facts dialogue generation by using the role playing model based on the modified key variables so as to obtain dialogue contents; Inputting the dialogue content into the intention judgment model, mining a low confidence sample and a boundary confusion sample, and feeding the variant data of the mined sample back to the training process of the intention judgment model to establish a dynamic game relationship between the role model and the intention judgment model.
2. The method of claim 1, wherein the screening dialogue samples and extracting key variables to build an instruction trim dataset, modifying model architecture to support generating different dialogue paths based on externally specified variables, and training the modified model architecture based on the dataset to obtain a role playing model, comprising: selecting a high-quality sample without intention of judgment error, voice recognition error or logic discontinuity from the history dialogue text subjected to desensitization treatment; Marking key variables affecting a conversation process in the high quality samples; Converting the key variable into a role control prompt word, complementing the speaking content of the user according to the role control prompt word by utilizing a general LLM, and checking to obtain a fine tuning data set; based on Transformer Decoder architecture, adding a controllable variable embedded layer independent of text coding to obtain an improved model architecture; and performing supervised fine tuning on the improved model framework by using the fine tuning data set, so that the improved model framework generates corresponding reply content according to the input role control prompt words to obtain the role playing model.
3. The method of claim 2, wherein the key variables include user attributes and business scenario features.
4. The method for training the intention judgment model according to claim 1, wherein the step of converting the key variable into a character control prompt word, and using the generic LLM to complement the speaking content of the user according to the character control prompt word, and performing verification to obtain the fine-tuning data set comprises: converting the key variable into a structured role control prompt word; Reserving the robot side speaking, and complementing the user side speaking by using the general LLM according to the provided role control prompt words so as to obtain the speaking content of the user; And verifying the speaking content of the user to obtain a fine tuning data set.
5. The method for training an intention judgment model according to claim 1, wherein the generating a counterfactual dialogue using a role playing model based on the modified key variables to obtain dialogue content comprises: modifying the true value of the key variable, and designing a new role control prompt word; and simulating the dialogue between the customer service and the user by combining the role control prompt words by using a role playing model, and generating a virtual reply conforming to the intervention setting based on the real value so as to obtain dialogue content.
6. The method for training an intention recognition model according to claim 5, wherein the simulating a dialogue between a customer service and a user using a role playing model in combination with the role control prompt word, generating a virtual reply conforming to an intervention setting based on a true value, to obtain dialogue content, comprises: when the robot initiates a dialogue, the role playing model generates corresponding user replies according to the true value, the robot speaking and the role control prompt word so as to obtain dialogue contents.
7. The method for training an intention recognition model according to claim 1, wherein the inputting the dialogue content into the intention recognition model and mining low confidence samples and boundary confusion samples and feeding the mined variant data of the samples back into the training process of the intention recognition model to establish a dynamic game relationship between the character model and the intention recognition model comprises: inputting the dialogue content into an intention judgment model to evaluate the classification result and the confidence of each dialogue in real time by the intention judgment model; Screening samples with confidence meeting requirements and boundary confusion samples as seed data according to the output of the intention discrimination model to obtain difficult samples; feeding back the difficult samples to the character model to generate variant data; And after the difficult sample and the variant data are subjected to model verification, adding a training set, and retraining the intention judging model by using the training set so as to establish a dynamic game relationship between the role model and the intention judging model.
8. The intention discriminating model training device is characterized by comprising: The role playing model training unit is used for screening dialogue samples, extracting key variables, constructing an instruction fine-tuning data set, improving a model framework to support the generation of different dialogue paths according to externally designated variables, and training the improved model framework according to the data set to obtain a role playing model; A dialogue content generation unit for performing a counterfactual dialogue generation using the role playing model based on the modified key variables to obtain dialogue content; The iterative optimization unit is used for inputting the dialogue content into the intention judgment model, mining a low confidence sample and a boundary confusion sample, and feeding the variant data of the mined sample back to the training process of the intention judgment model so as to establish a dynamic game relation between the role model and the intention judgment model.
9. The intention-to-discriminant model training apparatus of claim 8, wherein said role-playing model training unit comprises: a selecting subunit for selecting a high-quality sample without intention of judgment error, voice recognition error or logic discontinuity from the history dialogue text subjected to the desensitization processing; Marking subunit, configured to mark key variables affecting a conversation process in the high-quality sample; the data set forming subunit is used for converting the key variable into a role control prompt word, complementing the speaking content of the user according to the role control prompt word by utilizing the general LLM, and checking to obtain a fine-tuning data set; an improvement subunit, configured to add a controllable variable embedding layer independent of text encoding based on Transformer Decoder architecture, so as to obtain an improved model architecture; and the fine tuning training subunit is used for performing supervised fine tuning on the improved model framework by using the fine tuning data set, so that the improved model framework generates corresponding reply content according to the input role control prompt word to obtain the role playing model.
10. A computer device, characterized in that it comprises a memory on which a computer program is stored and a processor which, when executing the computer program, implements the method according to any of claims 1-7.

Description

Training method and device for intention discrimination model and computer equipment Technical Field The invention relates to artificial intelligence, in particular to an intention discrimination model training method, an intention discrimination model training device and computer equipment. Background In modern intelligent outbound systems, accurate identification of user intent is critical to improving quality of service and efficiency. The currently prevailing methods rely on neural network-based language models or discriminant models to achieve this goal. However, in the field of financial services such as banking, the amount of data available to train these models is very limited because customer call data involves personal privacy and is severely limited. Against this challenge, the use of large language models for synthetic dialog data generation is considered a viable solution. By letting the large language model play a specific role to simulate a real dialogue, realistic multi-round human-computer interaction samples can be generated. Although this approach presents great potential, the prior art still faces two major problems, namely the difficulty in covering long tail and extreme scenes and the lack of an optimization mechanism for model weak points. In particular, the generic generative model tends to reproduce common conversational patterns, ignoring rare situations critical in the business, such as complex complaints, extreme customer moods, etc. Meanwhile, the unidirectional generation process cannot be adjusted according to the performance of the intention judgment model, which means that for a 'simple sample' which can be well identified by the model, difficult to supplement difficult samples which are most valuable for model lifting. Therefore, there is a need to devise a new approach that enables the generation of more diverse and challenging training data through dynamic gaming mechanisms and intervention on key variables. Particularly, by applying 'intervention' to key causal variables in the historical dialog, a 'how if any' anti-facts dialog path can be generated, so that rare scenes and intention boundary samples can be systematically expanded, and the performance and reliability of the intention judgment model in practical application can be remarkably improved. Disclosure of Invention The invention aims to overcome the defects of the prior art and provides an intention judgment model training method, an intention judgment model training device and computer equipment. In order to achieve the purpose, the invention adopts the following technical scheme that the method for training the intention judgment model comprises the following steps: Screening dialogue samples, extracting key variables, constructing an instruction fine tuning data set, improving a model architecture to support different dialogue paths generated according to externally specified variables, and training the improved model architecture according to the data set to obtain a role playing model; Performing anti-facts dialogue generation by using the role playing model based on the modified key variables so as to obtain dialogue contents; Inputting the dialogue content into the intention judgment model, mining a low confidence sample and a boundary confusion sample, and feeding the variant data of the mined sample back to the training process of the intention judgment model to establish a dynamic game relationship between the role model and the intention judgment model. The method comprises the steps of screening dialogue samples, extracting key variables, constructing an instruction fine-tuning data set, improving a model architecture to support generation of different dialogue paths according to externally designated variables, and training the improved model architecture according to the data set to obtain a role playing model, wherein the method comprises the following steps: selecting a high-quality sample without intention of judgment error, voice recognition error or logic discontinuity from the history dialogue text subjected to desensitization treatment; Marking key variables affecting a conversation process in the high quality samples; Converting the key variable into a role control prompt word, complementing the speaking content of the user according to the role control prompt word by utilizing a general LLM, and checking to obtain a fine tuning data set; based on Transformer Decoder architecture, adding a controllable variable embedded layer independent of text coding to obtain an improved model architecture; and performing supervised fine tuning on the improved model framework by using the fine tuning data set, so that the improved model framework generates corresponding reply content according to the input role control prompt words to obtain the role playing model. The key variables comprise user attributes and service scene characteristics. The key variable is converted into a role control prompt word,