CN-122020158-A - Method and device for determining sample data of generated model

CN122020158ACN 122020158 ACN122020158 ACN 122020158ACN-122020158-A

Abstract

The embodiment of the specification provides a method and a device for determining sample data of a generated model, which can give out other label questions for users to select on the basis of answering user questions in an intelligent question-answering service scene. Additional label questions may include the question of generative model generation. The generative model herein may be a natural language processing model such as a large language model. In order to optimize the generated model, the label problem in the online use process can be marked in a correlation manner, and a preference data pair is constructed by marking at least a label problem with moderate correlation, so that a training sample for preference reinforcement learning is formed. Therefore, the generated training samples can be used for guiding the generated results of the generated model, the user requirements can be more accurately met, and better user experience is provided for the intelligent question-answering system.

Inventors

HE YUE

Assignees

支付宝(杭州)数字服务技术有限公司

Dates

Publication Date: 20260512
Application Date: 20260106

Claims (11)

1. A method of determining sample data for a generative model for generating a label question for further selection by a user based on a user question, the method comprising: acquiring a first question input by a first user, and pushing a first answer and at least one label question aiming at the first question; Labeling the relevance level of each label problem with the first problem to obtain a first label problem with moderate relevance level; constructing a first preference data pair based on the first tag question, the first preference data pair including the first tag question and a second tag question corresponding to other relevance marking levels; Determining a first training sample, wherein the first training sample comprises the first question, the first answer and the first preference data pair and is used for carrying out preference reinforcement learning on the generated model, and in the preference reinforcement learning process, the first label question is taken as a dominant preference, and the second label question is taken as a disadvantaged preference.
2. The method of claim 1, wherein the at least one tag problem is determined by: K+M candidate questions are acquired, wherein K first candidate questions aiming at the first question are generated by utilizing the generation model based on the first question; based on the first problem and each problem in a preset problem set, matching to obtain M second candidate problems with highest matching degree; and respectively scoring the K+M candidate questions, and selecting the at least one label question according to the order of the scoring scores from the big to the small.
3. The method of claim 2, wherein the scoring score of a single candidate question is obtained by weighting its historical click rate, which is the ratio of the number of clicks of the corresponding candidate question during historical pushing to the number of pushes, and the relevance of the single candidate question to the first question is determined by the vector similarity of both semantically embedded vectors.
4. The method of claim 1, wherein said labeling the relevance level of each tag question to the first question comprises: In the case where the user selects one of the tag questions, determining the relevance level of the user-selected tag question as moderate; in the case where the user inputs a new question, the new question inputted by the user is determined as a label question having a moderate level of correlation with the first question.
5. The method of claim 4, wherein the relevance level of each label question not clicked by the user is annotated by at least one of manual annotation, annotation by a pre-trained annotation model, invoking a large language model annotation, wherein the input data of the annotation model comprises the first question, the first answer, a single label question to be annotated.
6. The method of claim 1, wherein the second tag question is a tag question of which a relevance level is marked as another level among the at least one tag questions, or a question of another relevance level generated based on a tag question marked as moderate by invoking a large language model.
7. The method of claim 1, wherein the first label problem satisfies at least one of the following conditions: Consistent with the subject to which the first problem is directed; the number of words is within a predetermined range.
8. The method of claim 1, wherein the other relevance marking level includes strong relevance, weak relevance.
9. An apparatus for determining sample data for a generative model for generating a label question for further selection by a user based on a user question, the apparatus comprising: The system comprises an acquisition unit, a first answer pushing unit and a label processing unit, wherein the acquisition unit is configured to acquire a first question input by a first user, a first answer pushed to the first question and at least one label question; The labeling unit is configured to label the relevance level of each label problem with the first problem to obtain a first label problem with a moderate relevance level; A construction unit configured to construct a first preference data pair based on the first tag question, the first preference data pair including the first tag question and a second tag question corresponding to another relevance marking level; A determining unit, configured to determine a first training sample, where the first training sample includes the first question, the first answer, and the first preference data pair, and is used to perform preference reinforcement learning on the generated model, where in the preference reinforcement learning process, the first label question is used as a dominant preference, and the second label question is used as a minor preference.
10. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-8.
11. A computing device comprising a memory and a processor, wherein the memory has executable code stored therein, which when executed by the processor, implements the method of any of claims 1-8.

Description

Method and device for determining sample data of generated model Technical Field One or more embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method and apparatus for determining sample data for generating a model. Background Intelligent question-answering is an important research direction in the field of artificial intelligence, and aims to realize accurate semantic understanding and response of human-computer interaction through Natural Language Processing (NLP) and knowledge reasoning technology. With the breakthrough of deep learning technology, intelligent question-answering systems have evolved from rule-based template matching to comprehensive systems that fuse semantic representations, knowledge maps and generative models. The intelligent question and answer technology is widely applied to intelligent customer service, education coaching, medical consultation and other scenes. By pre-training a model (such as a large language model LLM) in the fine tuning field, the system can adapt to the requirements of the vertical industry, and the accurate analysis of most problems is realized. However, in some more specialized areas, the problem entered by the user may be in some less specialized or inaccurate place where it is desirable to further query the user's intent or extend the user's problem in addition to responding to the comprehension output of the current problem. For example, in the medical field, after a user asks "whether the test result is fasting blood glucose 7.9 belonging to diabetes. In order to improve user experience, how to actively extend the user questions and enable the intelligent questions and answers to be continued is an important technology worthy of research. Disclosure of Invention One or more embodiments of the present specification describe a method and apparatus for determining sample data for generating a model to address one or more of the problems mentioned in the background. According to a first aspect, a method for determining sample data of a generation model is provided, the generation model is used for generating label questions for further selection of a user according to user questions, the method comprises the steps of obtaining first questions input by the first user, first answers pushed to the first questions and at least one label question, marking relevance levels of the first questions on the label questions to obtain first label questions with moderate relevance levels, constructing first preference data pairs based on the first label questions, the first preference data pairs comprise the first label questions and second label questions corresponding to other relevance marking levels, and determining first training samples, wherein the first training samples comprise the first questions, the first answers and the first preference data pairs, and are used for preference reinforcement learning on the generation model, and the first label questions serve as advantage preference and the second label questions serve as preference. According to one embodiment, the at least one label question is determined by obtaining K+M candidate questions, generating K first candidate questions for the first question by using the generation model based on the first question, matching the first question with each question in a preset question set to obtain M second candidate questions with highest matching degree, scoring the K+M candidate questions respectively, and selecting the at least one label question according to the order of the scoring scores from high to low. According to a further embodiment, the score of the single candidate question is obtained by weighting its historical click rate, which is the ratio of the number of clicks of the corresponding candidate question in the history pushing process to the number of pushes, and the relevance of the single candidate question to the first question is determined by the vector similarity of the two semantically embedded vectors. According to one embodiment, the marking of the relevance level of each label question with the first question comprises the steps of determining that the relevance level of the label question selected by a user is moderate when the user selects one of the label questions, and determining that the new question input by the user is a label question with moderate relevance level with the first question when the user inputs a new question. According to a further embodiment, the relevance level of each label question not clicked by a user is marked by at least one of manual marking, marking by a pre-trained marking model and calling a large language model marking, wherein the input data of the marking model comprises the first question, the first answer and a single label question to be marked. According to one embodiment, the second tag question is a tag question of which a relevance level is marked as another level among the at least one ta