CN-122019720-A - Dialogue processing method and device and computer equipment

CN122019720ACN 122019720 ACN122019720 ACN 122019720ACN-122019720-A

Abstract

The application provides a dialogue processing method, a dialogue processing device and computer equipment, wherein the dialogue processing method comprises the steps of determining first information based on current input information, utilizing a clipping model to clip the first information to obtain second information, wherein the second information represents key information of the first information, and the second information serves as input of a preset dialogue model to generate reply information corresponding to the current input information, the clipping model is obtained by reinforcement learning training through a preset multidimensional rewarding function, and the multidimensional rewarding function is used for updating parameters of the clipping model to be trained at least once. In this way, the first information is cut through the cutting model trained by using the multidimensional reward function and the feedback mechanism of reinforcement learning, so that the output second information has semantic integrity and length controllability while key information is reserved.

Inventors

CHENG RUI

Assignees

联想(北京)有限公司

Dates

Publication Date: 20260512
Application Date: 20260129

Claims (10)

1. A dialog processing method, comprising: Determining first information based on the current input information; Cutting the first information by using a cutting model to obtain second information, wherein the second information represents key information of the first information, and the second information is used as input of a preset dialogue model to generate reply information corresponding to the current input information; The cutting model is obtained by reinforcement learning training through a preset multi-dimensional rewarding function, the multi-dimensional rewarding function is used for updating parameters of the cutting model to be trained at least once, the multi-dimensional rewarding function comprises at least two of a first rewarding value, a second rewarding value, a third rewarding value and a fourth rewarding value, the first rewarding value represents semantic integrity and semantic accuracy of key information obtained by cutting, the second rewarding value represents length compliance of the key information obtained by cutting, the third rewarding value represents redundancy of the key information obtained by cutting, and the fourth rewarding value represents format compliance of the key information obtained by cutting.
2. The method of claim 1, the cropping the first information using a cropping model to obtain second information, comprising: and under the condition that the length of the first information is larger than a preset length threshold value, cutting the first information by using the cutting model to obtain the second information.
3. The method of claim 1, the determining first information based on current input information, comprising: In the event that it is determined that there is at least one round of historical dialog information, the first information is determined based on the current input information and the at least one round of historical dialog information.
4. A method according to claim 3, the method further comprising: and generating new historical dialogue information based on the current input information, the reply information corresponding to the current input information and the at least one round of historical dialogue information.
5. The method of any one of claims 1 to 4, further comprising: training the first clipping model based on the training data set to obtain a second clipping model; And performing reinforcement learning training on the second clipping model based on the multidimensional rewarding function to obtain the clipping model.
6. The method of claim 5, the reinforcement learning training of the second clipping model based on the multi-dimensional rewards function, resulting in the clipping model, comprising: Inputting a sample into the second clipping model to obtain prediction key information of the sample; determining a prize value for the sample based on the multidimensional prize function and predictive key information for the sample; And updating parameters of the second clipping model at least once based on the rewarding value of the sample to obtain the clipping model.
7. The method of claim 6, the determining a prize value for the sample based on the multidimensional prize function and predictive key information for the sample, comprising: determining the first prize value based on the predicted key information for the sample and the key information for the sample; determining the second prize value based on a length of predicted critical information for the sample; determining the third rewarding value based on the length of the predicted key information of the sample and a preset minimum length; determining the fourth reward value based on the marking information in the predicted key information of the sample; Determining a prize value for the sample based on the first prize value, the second prize value, the third prize value, and the fourth prize value.
8. The method of claim 7, the determining a second prize value based on a length of predictive key information for the sample, comprising: Taking a preset value as the second rewarding value under the condition that the length of the predicted key information of the sample is larger than the length threshold value; And determining a target pruning proportion based on the length of the sample under the condition that the length of the predicted key information of the sample is not greater than the length threshold value, and determining the second rewarding value based on the target pruning proportion and the length of the predicted key information of the sample.
9. A dialog processing device comprising: the determining module is used for determining first information based on the current input information; The clipping module is used for clipping the first information by using a clipping model to obtain second information, wherein the second information represents key information of the first information, and the second information is used as input of a preset dialogue model to generate reply information corresponding to the current input information; The cutting model is obtained by reinforcement learning training through a preset multi-dimensional rewarding function, the multi-dimensional rewarding function is used for updating parameters of the cutting model to be trained at least once, the multi-dimensional rewarding function comprises at least two of a first rewarding value, a second rewarding value, a third rewarding value and a fourth rewarding value, the first rewarding value represents semantic integrity and semantic accuracy of key information obtained by cutting, the second rewarding value represents length compliance of the key information obtained by cutting, the third rewarding value represents redundancy of the key information obtained by cutting, and the fourth rewarding value represents format compliance of the key information obtained by cutting.
10. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 8 when the computer program is executed.

Description

Dialogue processing method and device and computer equipment Technical Field The present application relates to, but not limited to, the field of computer technologies, and in particular, to a method and apparatus for processing a session, and a computer device. Background In recent years, large language models (Large Language Models, LLMs) are widely used in the fields of intelligent customer service, dialogue systems, education, medical treatment, and the like, with their strong natural language understanding and generating capabilities. In the related art, methods such as deleting early dialogue content or compressing context are generally adopted to control the input length of LLMs, but key information is not completely reserved, and progressive loss of core semantics is easily caused, so that the accuracy of dialogue reply is obviously reduced. Disclosure of Invention The embodiment of the application provides a dialogue processing method, a dialogue processing device and computer equipment. The technical scheme of the embodiment of the application is realized as follows: in a first aspect, an embodiment of the present application provides a session processing method, where the processing method includes: Determining first information based on the current input information; Cutting the first information by using a cutting model to obtain second information, wherein the second information represents key information of the first information, and the second information is used as input of a preset dialogue model to generate reply information corresponding to the current input information; The cutting model is obtained by reinforcement learning training through a preset multi-dimensional rewarding function, the multi-dimensional rewarding function is used for updating parameters of the cutting model to be trained at least once, the multi-dimensional rewarding function comprises at least two of a first rewarding value, a second rewarding value, a third rewarding value and a fourth rewarding value, the first rewarding value represents semantic integrity and semantic accuracy of key information obtained by cutting, the second rewarding value represents length compliance of the key information obtained by cutting, the third rewarding value represents redundancy of the key information obtained by cutting, and the fourth rewarding value represents format compliance of the key information obtained by cutting. In a second aspect, an embodiment of the present application provides a session processing apparatus, including: the determining module is used for determining first information based on the current input information; The clipping module is used for clipping the first information by using a clipping model to obtain second information, wherein the second information represents key information of the first information, and the second information is used as input of a preset dialogue model to generate reply information corresponding to the current input information; The cutting model is obtained by reinforcement learning training through a preset multi-dimensional rewarding function, the multi-dimensional rewarding function is used for updating parameters of the cutting model to be trained at least once, the multi-dimensional rewarding function comprises at least two of a first rewarding value, a second rewarding value, a third rewarding value and a fourth rewarding value, the first rewarding value represents semantic integrity and semantic accuracy of key information obtained by cutting, the second rewarding value represents length compliance of the key information obtained by cutting, the third rewarding value represents redundancy of the key information obtained by cutting, and the fourth rewarding value represents format compliance of the key information obtained by cutting. In a third aspect, embodiments of the present application provide a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing some or all of the steps of the above method when executing the computer program. Drawings FIG. 1 is a schematic diagram of a conversation processing method according to an embodiment of the present application; FIG. 2 is a schematic diagram of a processing procedure of a dialogue processing method according to an embodiment of the present application; FIG. 3 is a schematic diagram of a training process of a clipping model provided by an embodiment of the present application; fig. 4 is a schematic structural diagram of a dialogue processing device according to an embodiment of the present application. It should be noted that the above "first" and "second" are only used to distinguish between different schemes, and do not represent the degree of preference or priority in implementation. Detailed Description For the purpose of making the objects, technical solutions and advantages of the embodiments of the present applica