CN-116932701-B - Text retrieval method, device, equipment, medium and product

CN116932701BCN 116932701 BCN116932701 BCN 116932701BCN-116932701-B

Abstract

The application provides a text retrieval method, a device, equipment, a medium and a product, which are characterized by acquiring a plurality of first similarities between target questions input by a user and documents in a preset knowledge base respectively, selecting at least two first documents from the documents corresponding to the plurality of first similarities for selection by the user, determining a plurality of second similarities between the target questions and sections in the first target documents respectively based on a first similarity algorithm, determining a plurality of third similarities between the target questions and sections in the first target documents respectively based on a second similarity algorithm, selecting a first target similarity with the first similarity of N bits and a second target similarity with the second similarity of N bits according to the sequence from large to small for each second similarity, and integrating the first section corresponding to the first target similarity and the second section corresponding to the second target similarity to obtain a target answer corresponding to the target questions. The embodiment of the application can improve the accuracy of text retrieval.

Inventors

Cai Suxian
YAN SHIJIANG
ZHAN CHAOQUN
YU YANG
XIA CHENGYANG
MA KUN

Assignees

中国建设银行股份有限公司
建信金融科技有限责任公司

Dates

Publication Date: 20260505
Application Date: 20230829

Claims (9)

1. A text retrieval method, the method comprising: acquiring a plurality of first similarities between target problems input by a user and each document in a preset knowledge base; Selecting at least two first documents meeting preset rules from the documents corresponding to the plurality of first similarities for users to select, wherein the preset knowledge base comprises the first documents; receiving a first input sent by the user, wherein the first input is used for selecting a first target document from a plurality of first documents; Determining a plurality of second similarities between the target question and each of the paragraphs in the first target document based on a first similarity algorithm in response to the first input, and a plurality of third similarities between the target question and each of the paragraphs in the first target document based on a second similarity algorithm; Selecting a first target similarity of N bits before the second similarity according to the sequence from big to small for each second similarity, and selecting a second target similarity of N bits before the third similarity according to the sequence from big to small for each third similarity, wherein N is a positive integer greater than 1; Integrating the first paragraph corresponding to the first target similarity and the second paragraph corresponding to the second target similarity to obtain a target answer corresponding to the target question; The determining, in response to the first input, a plurality of second similarities between the target question and the respective paragraphs in the first target document based on a first similarity algorithm, the first similarity algorithm being the vector similarity algorithm, and determining, in response to the first input, a plurality of third similarities between the target question and the respective paragraphs in the first target document based on a second similarity algorithm, the second similarities being the vector similarity algorithm, comprising determining, in response to the first input, a plurality of second similarities between the target question and the respective paragraphs in the first target document using a vector similarity algorithm; and determining a plurality of similarity scores between the target problem and each paragraph in the first target document by using a BM25 similarity algorithm, wherein the second similarity algorithm is the BM25 similarity algorithm, and the third similarity is the similarity score.
2. The method of claim 1, wherein the obtaining a plurality of first similarities between the target questions input by the user and the documents in the preset knowledge base respectively includes: for each third paragraph in the preset knowledge base, acquiring a first vector similarity between the third paragraph and the target problem, wherein the third paragraph is any paragraph in any document in the preset knowledge base; the selecting at least two first documents meeting a preset rule from the documents corresponding to the plurality of first similarities for the user to select includes: For each first vector similarity, selecting a third target similarity of K bits before the first vector similarity according to the sequence from large to small, wherein K is a positive integer greater than 1; for each third target similarity, acquiring a reference document to which a paragraph corresponding to the third target similarity belongs, wherein each reference document corresponds to at least one third target similarity; For each reference document, attributing the reference similarity corresponding to the reference document to the maximum similarity, wherein the maximum similarity is the similarity corresponding to the maximum value in the third target similarity corresponding to the reference document; for each reference similarity, selecting a fourth target similarity of M bits before the reference similarity according to the sequence from large to small, wherein M is a positive integer greater than 1, and M is less than or equal to K; And taking the reference document corresponding to the fourth target similarity as the first document.
3. The method of claim 1, wherein the integrating the first paragraph corresponding to the first target similarity and the second paragraph corresponding to the second target similarity to obtain the target answer corresponding to the target question includes: for a first paragraph corresponding to each first target similarity, splicing the first paragraph and a paragraph adjacent to the first paragraph into a first answer; for a second paragraph corresponding to each second target similarity, splicing the second paragraph and a paragraph adjacent to the second paragraph into a second answer; And under the condition that the first answer and the second answer are related to the target question, integrating the first answer and the second answer through a preset language model to obtain a target answer corresponding to the target question.
4. The method of claim 3, wherein after integrating the first paragraph corresponding to the first target similarity and the second paragraph corresponding to the second target similarity to obtain the target answer corresponding to the target question, the method further comprises: receiving a second input sent by a user, wherein the second input is used for acquiring the first answer and the second answer corresponding to the target answer; The first answer and the second answer are displayed in response to the second input.
5. The method of claim 1, wherein prior to the obtaining a plurality of first similarities between the target questions entered by the user and the respective documents in the preset knowledge base, the method further comprises: Acquiring a sample document; Classifying the sample documents, and storing the sample documents belonging to the same category in the same storage catalog; unifying the document format of the sample document into a preset format to obtain a first processed document; Removing redundant information in the first processing document to obtain a second processing document, wherein the redundant information comprises at least one of a document title, a document catalog, a contact person, contact information, a reader, a proofreading person and a sender; splicing the titles smaller than the preset number of words in the second processing document with texts adjacent to the titles to obtain a target processing document; and carrying out text paragraph vectorization on each paragraph in the target processing document to obtain the preset knowledge base, wherein the preset knowledge base comprises the types corresponding to the target processing document and vector information corresponding to each paragraph in the target processing document.
6. A text retrieval apparatus, the apparatus comprising: The acquisition module is used for acquiring a plurality of first similarities between target problems input by a user and each document in a preset knowledge base respectively; The first selection module is used for selecting at least two first documents meeting preset rules from the documents corresponding to the plurality of first similarities for selection by a user, and the preset knowledge base comprises the first documents; a receiving module, configured to receive a first input sent by the user, where the first input is used to select a first target document from a plurality of first documents; A determining module, configured to determine, in response to the first input, a plurality of second similarities between the target question and each of the paragraphs in the first target document based on a first similarity algorithm, and a plurality of third similarities between the target question and each of the paragraphs in the first target document based on a second similarity algorithm, wherein the determining, in response to the first input, a plurality of second similarities between the target question and each of the paragraphs in the first target document based on a first similarity algorithm, and a plurality of third similarities between the target question and each of the paragraphs in the first target document based on a second similarity algorithm, comprises determining, in response to the first input, a plurality of second vector similarities between the target question and each of the paragraphs in the first target document based on a vector similarity algorithm, the first similarity algorithm being the vector similarity algorithm, the second similarity being the second vector similarity algorithm, and determining, in response to the first input, a plurality of third similarities between the target question and each of the paragraphs in the first target document based on a second similarity algorithm, the BM similarity algorithm being the second similarity algorithm; The second selecting module is used for selecting the first target similarity of the N bits before the second similarity according to the sequence from big to small for each second similarity, and selecting the second target similarity of the N bits before the third similarity according to the sequence from big to small for each third similarity, wherein N is a positive integer greater than 1; And the integration module is used for integrating the first section corresponding to the first target similarity and the second section corresponding to the second target similarity to obtain a target answer corresponding to the target question.
7. An electronic device comprising a processor and a memory storing computer program instructions; The processor, when executing the computer program instructions, implements a text retrieval method as claimed in any one of claims 1-5.
8. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon computer program instructions, which when executed by a processor, implement a text retrieval method according to any of claims 1-5.
9. A computer program product, characterized in that instructions in the computer program product, when executed by a processor of an electronic device, cause the electronic device to perform the text retrieval method according to any of claims 1-5.

Description

Text retrieval method, device, equipment, medium and product Technical Field The present application relates to the field of information retrieval technologies, and in particular, to a text retrieval method, apparatus, device, medium, and product. Background For a system document, the same question is often the same, a plurality of documents may be retrieved, for example, the system may be different years, different system categories (operation manual, notification, management method, etc.) all contain similar contents, and an accurate answer is often in one document. In the prior art, the corresponding answer is usually obtained by matching the text similarity to the similar questions in a mode of constructing question-answer pairs. However, for a large number of risk policy system documents, the workload of constructing question-answer pairs is large, and when the question modes are different, relevant answers are not easily searched, so that the search accuracy is poor. Disclosure of Invention The text retrieval method, the device, the equipment, the medium and the product provided by the application can improve the accuracy of text retrieval. In a first aspect, an embodiment of the present application provides a text retrieval method, including: acquiring a plurality of first similarities between target problems input by a user and each document in a preset knowledge base; selecting at least two first documents meeting preset rules from the documents corresponding to the first similarities for selection by a user, wherein a preset knowledge base comprises the first documents; Receiving a first input sent by a user, wherein the first input is used for selecting a first target document from a plurality of first documents; Determining a plurality of second similarities between the target problem and the respective paragraphs in the first target document based on a first similarity algorithm in response to the first input, and a plurality of third similarities between the target problem and the respective paragraphs in the first target document based on the second similarity algorithm; Selecting a first target similarity of N bits before the second similarity according to the order from big to small for each second similarity, and selecting a second target similarity of N bits before the third similarity according to the order from big for each third similarity, wherein N is a positive integer greater than 1; And integrating the first paragraph corresponding to the first target similarity and the second paragraph corresponding to the second target similarity to obtain a target answer corresponding to the target question. In a second aspect, the present application provides a text retrieval apparatus comprising: The acquisition module is used for acquiring a plurality of first similarities between target problems input by a user and each document in a preset knowledge base respectively; the first selection module is used for selecting at least two first documents meeting preset rules from the documents corresponding to the plurality of first similarities for the user to select, and the preset knowledge base comprises the first documents; The receiving module is used for receiving a first input sent by a user, wherein the first input is used for selecting a first target document from a plurality of first documents; A determining module, configured to determine, in response to a first input, a plurality of second similarities between the target problem and each of the paragraphs in the first target document based on a first similarity algorithm, and determine a plurality of third similarities between the target problem and each of the paragraphs in the first target document based on the second similarity algorithm; The second selecting module is used for selecting the first target similarity of the N bits before the second similarity according to the sequence from big to small for each second similarity, and selecting the second target similarity of the N bits before the third similarity according to the sequence from big for each third similarity, wherein N is a positive integer larger than 1; and the integration module is used for integrating the first section corresponding to the first target similarity and the second section corresponding to the second target similarity to obtain a target answer corresponding to the target question. In a third aspect, an embodiment of the present application provides an electronic device comprising a processor and a memory storing computer program instructions; the processor when executing the computer program instructions implements the text retrieval method as in any of the embodiments of the first aspect. In a fourth aspect, embodiments of the present application provide a computer storage medium having stored thereon computer program instructions which, when executed by a processor, implement a text retrieval method as in any of the embodiments of the first aspect. In