CN-116361638-B - Question and answer searching method, device and storage medium

CN116361638BCN 116361638 BCN116361638 BCN 116361638BCN-116361638-B

Abstract

The application relates to an artificial intelligence technology and provides a question and answer search method, a question and answer search device and a storage medium, wherein the method comprises the steps of obtaining an answer search request of a target object for a target question, determining text characteristics of the target question, inputting the text characteristics into a question and answer search model to obtain the target answer, wherein the question and answer search model is a model which is obtained by training according to a target sample set and meets preset conditions, the target sample set comprises a sample set obtained by fusing a first sample set and a second sample set, the first sample set is a sample set obtained by processing according to a historical dialogue data set, and the second sample set is a sample set obtained by processing according to the historical dialogue data set and a marked file. By adopting the application, the diversity of the samples can be improved, the learning breadth of the model on the training samples can be improved, and the accuracy of searching answers can be improved.

Inventors

MAO YU
HUANG KAI
Jia Qiansen
XU WEI
ZHANG WENFENG
NA YINGQUAN

Assignees

招联消费金融有限公司

Dates

Publication Date: 20260512
Application Date: 20221205

Claims (8)

1. A question-answer search method, comprising: acquiring an answer search request of a target object for a target question; Determining text features of the target question; the text features are input into a question-answer search model to obtain target answers, wherein the question-answer search model is a model which is obtained by training according to a target sample set and meets preset conditions, the target sample set comprises a first sample set and a second sample set which are fused, the first sample set is a sample set obtained by processing according to a historical dialogue data set, and the second sample set is a sample set obtained by processing according to the historical dialogue data set and a marked file; analyzing the historical dialogue data set to obtain a domain word stock; screening the domain word stock to obtain a high-frequency domain word stock; Supplementing the domain word stock to obtain an associated domain word stock; constructing the first sample set according to the high-frequency domain word stock and the associated domain word stock; Selecting a reference sample corresponding to a preset sample type from the marked file; Obtaining a similarity value between each historical dialog data in the historical dialog data set and the reference sample; and screening the historical dialogue data set with the similarity value larger than a similarity threshold value from the historical dialogue data set to obtain the second sample set.
2. The method of claim 1, wherein the screening the domain word stock to obtain a high-frequency domain word stock comprises: obtaining vector characterization of each domain word in the domain word stock; Clustering the vector representations of the domain words to obtain at least two types of domain word clusters; acquiring the frequency of each field word cluster; And forming the domain word clusters with the frequency larger than a frequency threshold into a high-frequency domain word stock.
3. The method of claim 1, wherein the supplementing the domain word stock to obtain the associated domain word stock comprises: Searching the replacement word of each domain word in the domain word stock according to a preset rule corresponding to the domain type of the domain word stock; obtaining similar words of each domain word stock in the domain word stock; and supplementing the replacement words and the similar words to the domain word stock to obtain an associated domain word stock.
4. The method of claim 2, wherein constructing the first sample set from the high frequency domain thesaurus and the associated domain thesaurus comprises: Searching a target historical dialogue data set where at least one domain word is located in the domain word library from the historical dialogue data set; constructing a first sub-sample containing at least one domain word in the high-frequency domain word stock according to the target historical dialogue data; replacing the domain words in the target historical dialogue data according to at least one domain word in the associated domain word library to obtain a plurality of second sub-samples; and fusing the first sub-sample and the plurality of second sub-samples to obtain the first sample set.
5. The method of any of claims 1-4, wherein the determining text characteristics of the target question comprises: Determining keywords in the target problem and the technical field of the target problem; And determining the text characteristics of the target problem according to the technical field and the keywords.
6. A question-answer search device comprising means for performing a method according to any one of claims 1-5.
7. A computer device comprising a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by the processor, the programs comprising instructions for performing the steps of the method of any of claims 1-5.
8. A computer readable storage medium storing a computer program that causes a computer to execute to implement the method of any one of claims 1-5.

Description

Question and answer searching method, device and storage medium Technical Field The application relates to the technical field of artificial intelligence, in particular to a question and answer searching method, a question and answer searching device and a storage medium. Background The text question-answer matching algorithm is mainly used for business scenes such as customer service, outbound robots and the like. With the rapid development of the Internet, a plurality of question-answer search systems with manual participation gradually change to an automatic and manual combination mode, and partial problems are solved by using automatic question-answer recommendation, so that the manual participation can be reduced, and the user requirements can be responded quickly. At present, training samples adopted by a question-answer search model in a question-answer search system are obtained by manually marking historical dialogue data. However, some types of samples in the actual question-answer scene, for example, negative samples, difficult samples, etc., are fewer in number, resulting in fewer numbers of such samples, and difficulty in improving the accuracy of answer search is low. Disclosure of Invention The embodiment of the application provides a question and answer searching method, a question and answer searching device and a storage medium, which can improve the diversity of samples, improve the learning breadth of a model on training samples and facilitate the improvement of the accuracy of searching answers. The method comprises the steps of obtaining an answer search request of a target object for a target question, determining text features of the target question, and inputting the text features into an answer search model to obtain the target answer, wherein the answer search model is a model which is obtained by training according to a target sample set and meets preset conditions, the target sample set comprises a sample set obtained by fusing a first sample set and a second sample set, the first sample set is a sample set obtained by processing according to a historical dialogue data set, and the second sample set is a sample set obtained by processing according to the historical dialogue data set and a marked file. In one possible example, the method further comprises the steps of analyzing the historical dialogue data set to obtain a domain word stock, screening the domain word stock to obtain a high-frequency domain word stock, supplementing the domain word stock to obtain an associated domain word stock, and constructing a first sample set according to the high-frequency domain word stock and the associated domain word stock. In one possible example, the domain word stock is screened to obtain a high-frequency domain word stock, which comprises the steps of obtaining vector representation of each domain word in the domain word stock, clustering the vector representation of the domain word to obtain at least two types of domain word clusters, obtaining frequencies of various domain word clusters, and forming the domain word clusters with the frequencies larger than a frequency threshold into the high-frequency domain word stock. In one possible example, the domain word stock is supplemented to obtain an associated domain word stock, and the method comprises the steps of searching for a replacement word of each domain word in the domain word stock according to a preset rule corresponding to the domain type of the domain word stock, obtaining a similar word of each domain word stock in the domain word stock, and supplementing the replacement word and the similar word to the domain word stock to obtain the associated domain word stock. In one possible example, a first sample set is constructed according to a high-frequency domain word stock and an associated domain word stock, and the method comprises the steps of searching a target historical dialogue data set where at least one domain word in the domain word stock is located from a historical dialogue data set, constructing a first sub-sample containing the at least one domain word in the high-frequency domain word stock according to the target historical dialogue data, replacing the domain word in the target historical dialogue data according to the at least one domain word in the associated domain word stock to obtain a plurality of second sub-samples, and fusing the first sub-sample and the plurality of second sub-samples to obtain the first sample set. In one possible example, the method further comprises the steps of selecting a reference sample corresponding to a preset sample type from the marked file, obtaining a similarity value between each historical dialogue data and the reference sample in the historical dialogue data set, and screening the historical dialogue data set with the similarity value larger than a similarity threshold value from the historical dialogue data set to obtain a second sample set. In one possible example, d