Search

KR-102962307-B1 - METHOD FOR GENERATING RESPONSES BY USING INTENTIONAL NEAR OUT-OF-DISTRIBUTION EXPLORATION OF ARTIFICIAL INTELLIGENCE MODEL AND COMPUTING DEVICE USING THE SAME

KR102962307B1KR 102962307 B1KR102962307 B1KR 102962307B1KR-102962307-B1

Abstract

The present invention relates to a method for generating a response using an intentional near-out-of-distribution search of an artificial intelligence model, wherein (a) when a user input—the user input includes at least one modality data—is obtained, a computing device inputs the user input to a window module, which is an artificial intelligence model, so that the window module generates at least one target embedding vector corresponding to the user input, generates probability values for reference embedding vectors for at least some of the pre-trained reference data to be used to generate a response for the target embedding vectors, and generates at least one first response by sampling a first group of reference embedding vectors that are in-distribution among the reference embedding vectors with reference to the probability values, and generates a second group of reference embedding vectors that are near-out-of-distribution among the remaining candidate group of reference embedding vectors excluding the first group of reference embedding vectors with reference to the probability values The present invention relates to a method comprising: a step of generating at least one second response by sampling; and (b) the computing device inputting the first response and the second response to a verification module to cause the verification module to calculate a verification score for the first response and the second response, and determining a final response corresponding to the user input by reference to the verification scores.

Inventors

  • 고경석

Assignees

  • 주식회사 다성보험중개

Dates

Publication Date
20260508
Application Date
20251121
Priority Date
20250901

Claims (20)

  1. In a method for generating a response using intentional near-out-of-distribution search of an artificial intelligence model, (a) When a user input—the user input includes at least one modality data—is obtained, a computing device inputs the user input to a window module, which is an artificial intelligence model, so that the window module generates at least one target embedding vector corresponding to the user input, generates probability values for reference embedding vectors for at least some of the pre-trained reference data to be used to generate a response for the target embedding vectors, and, with reference to the probability values, samples a first group of candidate reference embedding vectors that are in-distribution among the reference embedding vectors to generate at least one first response, and with reference to the probability values, samples a second group of candidate reference embedding vectors that are near out-of-distribution among the remaining candidate reference embedding vectors excluding the first group of candidate reference embedding vectors to generate at least one second response; and (b) a step in which the computing device inputs the first response and the second response to a verification module to cause the verification module to calculate a verification score for the first response and the second response, and determines a final response corresponding to the user input by reference to the verification scores; A method including
  2. In paragraph 1, In step (a) above, A method for determining the second candidate group reference embedding vectors, wherein the computing device has the window module referencing the cosine value and norm value obtained by decomposing the logit value of the remaining candidate group reference embedding vectors with reference to the probability values, to determine at least one of (i) 2_1 remaining candidate group reference embedding vectors having a cosine value greater than a preset threshold cosine value and a norm value smaller than a preset threshold norm value among the remaining candidate group reference embedding vectors, and (ii) 2_2 remaining candidate group reference embedding vectors having a cosine value greater than the preset threshold cosine value and an embedding distribution with a density lower than a preset threshold density among the remaining candidate group reference embedding vectors as the second candidate group reference embedding vectors.
  3. In paragraph 1, In step (a) above, When the above user input is an input corresponding to a single modality, the computing device causes the window module to generate single target embedding vectors corresponding to the user input, and to generate first specific probability values of single reference embedding vectors for single reference data corresponding to the single modality among the reference data as the probability values in order to generate a response to the single target embedding vectors, and to generate at least one first specific response as the first response by sampling first candidate group single reference embedding vectors within the distribution among the single reference embedding vectors with reference to the first specific probability values, and to generate at least one second specific response as the second response by sampling second candidate group single reference embedding vectors outside the proximity distribution among the remaining candidate group single reference embedding vectors excluding the first candidate group single reference embedding vectors with reference to the first specific probability values. In step (b) above, A method in which the computing device inputs the first specific response and the second specific response to the verification module, causing the verification module to calculate a specific verification score for the first specific response and the second specific response as the verification score, and determines a final specific response corresponding to the user input as the final response by referencing the specific verification scores.
  4. In paragraph 1, In step (a) above, If the above user input is an input corresponding to multimodality—the above multimodality includes at least a first modality and a second modality—the computing device causes the window module to generate integrated target embedding vectors as the target embedding vectors corresponding to the user input, wherein the first target embedding vectors corresponding to the first modality portion of the user input and the second target embedding vectors corresponding to the second modality portion of the user input are integrated; generates second specific probability values for integrated reference embedding vectors corresponding to the first modality and the second modality among the reference data to be used to generate a response for the integrated target embedding vectors as the probability values; samples first candidate integrated reference embedding vectors within the distribution among the integrated reference embedding vectors using the second specific probability values as a reference to generate at least one first integrated response as the first response; and the second specific probability Based on the values, at least one second integrated response is generated as the second response by sampling second candidate group integrated reference embedding vectors that are outside the proximity distribution among the remaining candidate group integrated reference embedding vectors excluding the first candidate group integrated reference embedding vectors, and In step (b) above, A method in which the computing device inputs the first integrated response and the second integrated response to the verification module, causing the verification module to calculate an integrated verification score for the first integrated response and the second integrated response as the verification score, and determines a final integrated response corresponding to the user input as the final response by referencing the integrated verification scores.
  5. In paragraph 1, The above modality data is any one of text data, image data, audio data, video data, sensor data, structured data, motion data, tactile data, spatial data, and synthetic data.
  6. In paragraph 1, Prior to the above (a) step, (a0) The computing device, when the user input includes user request information, inputs the user request information and indirect information related to the user request information into a psychological analysis module, thereby causing the psychological analysis module to analyze a psychological signal, question intensity, and question difficulty with reference to the user request information and the indirect information, calculates a psychological score for the psychological signal, an intensity score for the question intensity, and a difficulty score for the question difficulty, and determines whether the sum of the psychological score, the intensity score, and the difficulty score, after assigning a predetermined weight to each of the psychological score, the intensity score, and the difficulty score, exceeds a preset threshold score; Includes more, If it is determined that the above total score exceeds the above preset threshold score, In step (b) above, the computing device performs a loop process comprising: (i) a first sub-process of selecting, by reference to the verification scores, responses satisfying a preset score condition among the first response and the second response as responses to be re-inputted; (ii) a second sub-process of inputting the responses to be re-inputted together with preset improvement request information to the window module so that the window module generates re-output responses in which the preset improvement request information is reflected in the responses to be re-inputted; and (iii) a third sub-process of inputting the re-output responses to the verification module so that the verification module calculates re-output verification scores for the re-output responses. A method for determining the final response using the above-mentioned re-output verification scores as the above-mentioned verification scores.
  7. In paragraph 6, A method in which the computing device determines the number of repetitions of the loop process by referring to the total score that exceeds the preset threshold score, and repeats the loop process according to the determined number of repetitions.
  8. In paragraph 6, The above psychological signal includes a language signal, a behavioral signal, and a platform signal, wherein (i) the language signal is determined by reference to at least some of the uncertainty-related keywords, novelty-related keywords, command intensity-related keywords, and urgency-related keywords included in the user request information, (ii) the behavioral signal is determined according to at least some of the typing interval information when the user request information is input, the time information taken from the start to the completion of the user request information, the number of times the user request information was modified while completing the user request information, and the number of times the user request information was reused, which are included in the indirect information, and (iii) the platform signal is determined according to at least some of the information on checking hint views and checking example views included in the indirect information. The strength of the above question is determined according to keywords related to the degree of strength of the change request included in the above user request information, and A method in which the difficulty level of the above question is determined according to technical field-related keywords included in the above user request information.
  9. In paragraph 6, A method in which the above user request information is of at least one data type among text data and voice data.
  10. In paragraph 1, In step (b) above, The above computing device has a method for determining the final response by determining the response with the highest score among the verification scores as the final response.
  11. In paragraph 1, Prior to the above (a) step, (a1) The computing device inputs the user input to a psychoanalysis module to cause the psychoanalysis module to determine whether the user input contains high-risk intention information, and if it is determined that the high-risk intention information is included, In step (a) above, The computing device inputs the user input to the window module, causing the window module to generate at least one target embedding vector corresponding to the user input, generates probability values of the reference embedding vectors for at least some of the pre-trained reference data to be used to generate a response to the target embedding vectors, and generates at least one first response by sampling the first candidate reference embedding vectors within the distribution among the reference embedding vectors using the probability values as a reference. In step (b) above, A method in which the computing device inputs the first response to the verification module to cause the verification module to calculate a first verification score for the first response, and determines the final response corresponding to the user input by reference to the first verification score.
  12. In paragraph 1, A method in which the computing device inputs result information related to the final response into a reporting module, thereby causing the reporting module to generate a multidimensional report on the result information—the multidimensional report includes at least some of text information on the result information, information quantifying the result information, and information visualizing the result information.
  13. In paragraph 1, In step (a) above, The computing device, in determining the second candidate group reference embedding vectors, has a method of causing the window module to determine the second candidate group reference embedding vectors among the remaining candidate group reference embedding vectors by referring to the remaining candidate group reference embedding vectors and the hidden vectors of the hidden state.
  14. In Paragraph 13 The computing device has a method of determining at least one of the following as the second candidate group reference embedding vectors: (i) calculating the cosine similarity between the cosine value of the remaining candidate group reference embedding vectors and the cosine value of the hidden vectors such that the cosine similarity exists within a preset threshold similarity range, and (ii) calculating the norm value of the remaining candidate group reference embedding vectors such that the norm value is smaller than the preset threshold norm value and (ii) measuring the embedding distribution of the remaining candidate group reference embedding vectors such that the norm value is smaller than the preset threshold density and the second candidate group reference embedding vectors such that the density distribution is lower than the preset threshold density.
  15. In a computing device that generates a response using intentional near-out-of-distribution search of an artificial intelligence model, At least one memory for storing instructions; and It includes at least one processor configured to execute the above instructions, The processor comprises: (I) when a user input—the user input includes at least one modality data—is acquired, input the user input to a window module which is an artificial intelligence model, thereby causing the window module to generate at least one target embedding vector corresponding to the user input, generate probability values for reference embedding vectors for at least some of the pre-trained reference data to be used to generate a response for the target embedding vectors, and, by reference to the probability values, generate at least one first response by sampling a first group of reference embedding vectors that are in-distribution among the reference embedding vectors, and, by reference to the probability values, generate at least one second response by sampling a second group of reference embedding vectors that are near out-of-distribution among the remaining candidate group of reference embedding vectors excluding the first group of reference embedding vectors, and (II) the first response and the second response A computing device that performs a process of inputting the first response and the second response into a verification module, causing the verification module to calculate a verification score for the first response and the second response, and determining a final response corresponding to the user input by referencing the verification scores.
  16. In paragraph 15, In the above (I) process, A computing device that, in determining the second candidate group reference embedding vectors, allows the window module to determine at least one of (i) 2_1 remaining candidate group reference embedding vectors having a cosine value greater than a preset threshold cosine value and a norm value smaller than a preset threshold norm value among the remaining candidate group reference embedding vectors, by referring to the cosine value and norm value obtained by decomposing the logit value of the remaining candidate group reference embedding vectors with reference to the probability values, and (ii) 2_2 remaining candidate group reference embedding vectors having a cosine value greater than the preset threshold cosine value and an embedding distribution with a density lower than a preset threshold density among the remaining candidate group reference embedding vectors, as the second candidate group reference embedding vectors.
  17. In paragraph 15, In the above (I) process, The processor performs a process in which, when the user input is an input corresponding to a single modality, the window module generates single target embedding vectors corresponding to the user input, generates first specific probability values for single reference embedding vectors corresponding to single reference data among the reference data that correspond to the single modality as the probability values to generate a response to the single target embedding vectors, samples first candidate group single reference embedding vectors within the distribution among the single reference embedding vectors using the first specific probability values as a reference to generate at least one first specific response as the first response, and samples second candidate group single reference embedding vectors outside the proximity distribution among the remaining candidate group single reference embedding vectors excluding the first candidate group single reference embedding vectors using the first specific probability values as a reference to generate at least one second specific response as the second response. In the above (II) process, A computing device that performs a process in which the processor inputs the first specific response and the second specific response to the verification module, causing the verification module to calculate a specific verification score for the first specific response and the second specific response as the verification score, and determines a final specific response corresponding to the user input as the final response by reference to the specific verification scores.
  18. In paragraph 15, In the above (I) process, When the user input is a multimodality input—the multimodality includes at least a first modality and a second modality—the processor causes the window module to generate integrated target embedding vectors as the target embedding vectors corresponding to the user input, wherein the first target embedding vectors corresponding to the first modality portion of the user input and the second target embedding vectors corresponding to the second modality portion of the user input are integrated; generates second specific probability values as the probability values for integrated reference embedding vectors corresponding to the first modality and the second modality among the reference data to be used to generate a response for the integrated target embedding vectors; samples first candidate integrated reference embedding vectors within the distribution among the integrated reference embedding vectors using the second specific probability values as a reference to generate at least one first integrated response as the first response; and the second specific probability A process is performed to generate at least one second integrated response as the second response by sampling second candidate group integrated reference embedding vectors that are outside the proximity distribution among the remaining candidate group integrated reference embedding vectors, excluding the first candidate group integrated reference embedding vectors, based on the values. In the above (II) process, A computing device that performs a process in which the processor inputs the first integrated response and the second integrated response to the verification module, causing the verification module to calculate an integrated verification score for the first integrated response and the second integrated response as the verification score, and determines a final integrated response corresponding to the user input as the final response by referencing the integrated verification scores.
  19. In paragraph 15, The above modality data is a computing device that is any one of text data, image data, audio data, video data, sensor data, structured data, motion data, tactile data, spatial data, and synthetic data.
  20. In paragraph 15, Prior to the above (I) process, (I_0) The processor further performs a process in which, if the user input includes user request information, the user request information and indirect information related to the user request information are input to a psychoanalysis module, thereby causing the psychoanalysis module to analyze psychological signals, question intensity, and question difficulty based on the user request information and the indirect information, calculate a psychological score for the psychological signal, an intensity score for the question intensity, and a difficulty score for the question difficulty, and determine whether the sum of the psychological score, the intensity score, and the difficulty score, after assigning a predetermined weight to each of the psychological score, the intensity score, and the difficulty score, exceeds a preset threshold score. If it is determined that the above total score exceeds the above preset threshold score, The processor performs a loop process comprising, in determining the final response in the process (II), (i) a first sub-process for selecting responses satisfying a preset score condition among the first response and the second response as responses to be re-inputted by referring to the verification scores; (ii) a second sub-process for inputting the responses to be re-inputted together with preset improvement request information to the window module so that the window module generates re-output responses in which the preset improvement request information is reflected in the responses to be re-inputted; and (iii) a third sub-process for inputting the re-output responses to the verification module so that the verification module calculates re-output verification scores for the re-output responses. A computing device that performs a process of determining the final response using the above-mentioned re-output verification scores as the above-mentioned verification scores.

Description

Method for generating responses by using intentional near-out-of-distance search of an artificial intelligence model and computing device using the same The present invention relates to a method for generating a response using intentional out-of-distribution search of an artificial intelligence model and a computing device using the same. More specifically, the invention relates to a method for generating at least one target embedding vector corresponding to a user input, generating probability values for reference embedding vectors for at least some of the pre-trained reference data to be used to generate a response for the target embedding vectors, generating at least one first response by sampling a first group of reference embedding vectors that are in-distribution among the reference embedding vectors based on the probability values, generating at least one second response by sampling a second group of reference embedding vectors that are near out-of-distribution among the remaining candidate group of reference embedding vectors excluding the first group of reference embedding vectors based on the probability values, calculating a verification score for the first response and the second response, and determining a final response corresponding to the user input based on the calculated verification scores, and a computing device using the same. Most currently used artificial intelligence models focus on excluding out-of-distribution candidates from probability distributions based on pre-trained data probabilities and utilizing only in-distribution candidates to improve response accuracy, which presents a problem in that it is difficult to expect creative responses. To address these problems, the Temperature method, which adjusts the probability distribution, the Top-K Sampling method, which samples only from the K data points with the highest probability rankings, and the Top-P Sampling method, which selects top-ranking data points as candidates until the sum of the probabilities reaches P (e.g., 70%) and samples only from the selected data points, are being used in Large Language Models, which are artificial intelligence models. However, in the case of the Temperature method, since it simply adjusts the probability distribution, there is a problem in that while a wide adjustment of the probability distribution may yield creative responses, there is also a possibility of responses that are completely unrelated and semantically incorrect. Furthermore, since the Top-K Sampling and Top-P Sampling methods simply select data in order of highest probability, they produce responses similar to existing learned patterns, which presents a problem in that it is difficult to expect creative responses. Therefore, in order to resolve the aforementioned problems, there is a need for measures to generate highly creative responses that are semantically similar to normal responses. The drawings attached below for use in describing embodiments of the present invention are merely some of the embodiments of the present invention, and other drawings can be obtained based on these drawings without inventive work by a person skilled in the art to which the present invention pertains (hereinafter "person skilled in the art"). FIG. 1 is a diagram showing the schematic configuration of a computing device that generates a response using intentional out-of-proximity search of an artificial intelligence model according to an embodiment of the present invention, and FIG. 2 is a schematic flowchart illustrating a method for generating a response using intentional out-of-proximity search of an artificial intelligence model according to an embodiment of the present invention, and FIG. 3 is a diagram illustrating an example of a method for determining second candidate group reference embedding vectors that are outside the proximity distribution among the remaining candidate group reference embedding vectors excluding the first candidate group reference embedding vectors according to an embodiment of the present invention, and FIGS. 4a, FIGS. 4b, and FIGS. 4c are drawings for explaining an example of a method for determining whether to perform a loop process and a method for performing a loop process according to an embodiment of the present invention. The following detailed description of the invention refers to the accompanying drawings, which illustrate specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. It should be understood that various embodiments of the invention are different but need not be mutually exclusive. For example, specific shapes, structures, and characteristics described herein may be modified from one embodiment to another without departing from the spirit and scope of the invention. It should also be understood that the location or arrangement of individual components within each embodiment may