Search

CN-121996687-A - Information retrieval method, device, electronic equipment and storage medium

CN121996687ACN 121996687 ACN121996687 ACN 121996687ACN-121996687-A

Abstract

The embodiment of the application discloses an information retrieval method, an information retrieval device, electronic equipment and a storage medium, which comprise the steps of obtaining a character string to be processed, carrying out repeated splitting on the character string to be processed according to a set length value if the length of the character string to be processed exceeds the set length value to obtain a plurality of splitting results, wherein the length of the character string to be processed is the number of characters contained in the character string to be processed, the set length value is a positive integer larger than 1, carrying out retrieval based on the plurality of splitting results to obtain a plurality of preliminary retrieval results, wherein each preliminary retrieval result comprises a plurality of splitting results, and determining a screening result of each preliminary retrieval result based on the distance between two adjacent splitting results in the preliminary retrieval results. In the embodiment of the application, the character strings to be processed are not repeatedly split, so that redundant calculation of repeated characters can be avoided, and the information retrieval efficiency is improved.

Inventors

  • Zhuang Xuanquan
  • CAI YANZHI
  • Rao Quanquan

Assignees

  • 腾讯科技(深圳)有限公司

Dates

Publication Date
20260508
Application Date
20241104

Claims (14)

  1. 1. An information retrieval method, the method comprising: acquiring a character string to be processed; if the length of the character string to be processed exceeds a set length value, carrying out non-repeated splitting on the character string to be processed according to the set length value to obtain a plurality of splitting results, wherein the length of the character string to be processed is the number of characters contained in the character string to be processed, and the set length value is a positive integer greater than 1; searching based on the plurality of split results to obtain a plurality of preliminary search results, wherein each preliminary search result comprises the plurality of split results; and for each preliminary retrieval result, determining a screening result of the preliminary retrieval result based on the distance between two adjacent split results in the plurality of split results.
  2. 2. The method of claim 1, wherein the determining the screening result of the preliminary search result based on the distance between the adjacent two split results in the plurality of split results and the preliminary search result comprises: and if the distance between the last two split results in the plurality of split results is a first distance value and the distance between any two adjacent split results except the last two split results is a second distance value, determining the preliminary search result as a final search result.
  3. 3. The method of claim 1, wherein the determining the screening result of the preliminary search result based on the distance between the adjacent two split results in the plurality of split results and the preliminary search result comprises: And if the distance between the last two split results in the plurality of split results is not a first distance value or the distance between any two adjacent split results except the last two split results is not a second distance value, rejecting the preliminary search result.
  4. 4. The method of claim 1, wherein after the obtaining the character string to be processed, the method further comprises: And if the length of the character string to be processed does not exceed the set length value, searching based on the character string to be processed to obtain a search result, wherein the search result is a final search result.
  5. 5. The method of claim 1, wherein prior to the obtaining the string to be processed, the method further comprises: For a target word of the text to be searched, determining an index termination bit based on the word length of the target word, wherein the target word is any word of the text to be searched; Determining the first bit of the target word as an index initial bit; Judging whether the index initial bit exceeds the index ending bit; If the index initial bit does not exceed the index ending bit, assigning the value of the index initial bit to a character offset bit; Reading the characters corresponding to the character offset bits, marking the characters as keyword character strings, and storing the keyword character strings as search words into the database; determining a subsequent bit of the character shift bit as a new character shift bit; Judging whether the character deviation bit exceeds the index termination bit; If the character offset bit does not exceed the index termination bit, reading a character corresponding to the character offset bit, splicing the read character with the keyword character string to obtain a splicing result, and storing the splicing result as a search word into the database; and assigning the value of the character offset bit to the index initial bit, and jumping to the step of judging whether the index initial bit exceeds the index ending bit.
  6. 6. The method of claim 5, wherein assigning the value of the character shift bit to the index initial bit and jumping to the step of determining whether the index initial bit exceeds the index end bit comprises: If the length of the keyword character string reaches a set length value, calculating an operation difference value between the set length value and a numerical value 2, and shifting the character offset forward by the operation difference value bit to obtain a new character offset; the value of the character offset bit is assigned to the index initial bit, and the step is skipped to judge whether the index initial bit exceeds the index ending bit, and the set length value is a positive integer greater than 1; And if the character offset bit does not exceed the index termination bit, reading the character corresponding to the character offset bit, splicing the read character with the keyword character string to obtain a splicing result, storing the splicing result as a search word into the database, and if the length of the keyword character string reaches a set length value, assigning the value of the character offset bit to the index initial bit, and jumping to the step of judging whether the index initial bit exceeds the index termination bit, wherein the method further comprises the following steps: taking the splicing result as a new keyword character string; Judging whether the length of the keyword character string reaches the set length value.
  7. 7. The method of claim 6, wherein after said determining whether the length of the keyword string reaches the set length value, the method further comprises: If the length of the keyword character string does not reach the set length value, jumping to the step of determining the next bit of the character offset as a new character offset.
  8. 8. The method of claim 5, wherein after said determining if said index initial bit exceeds said index end bit, said method further comprises: If the index initial bit exceeds the index ending bit, ending the process of storing the target word into a database; After the determining if the character offset bit exceeds the index termination bit, the method further comprises: And if the character offset bit exceeds the index termination bit, ending the process of storing the target word into a database.
  9. 9. The method of claim 5, wherein the method further comprises: And performing position allocation on all the search terms of the target word based on the time sequence of storing the search terms in the database.
  10. 10. The method of claim 9, wherein the assigning positions to all terms of the target word based on the chronological order in which the terms were stored in the database comprises: And incrementally distributing the positions of all the search terms of the target word based on the time sequence from the beginning to the end of the search term stored in the database.
  11. 11. An information retrieval apparatus, the apparatus comprising: a character string acquisition unit for acquiring character strings to be processed; The character string splitting unit is used for carrying out non-repeated splitting on the character string to be processed according to the set length value when the length of the character string to be processed exceeds the set length value to obtain a plurality of splitting results, wherein the length of the character string to be processed is the number of characters contained in the character string to be processed; The searching unit is used for searching based on the plurality of split results to obtain a plurality of preliminary search results, wherein each preliminary search result comprises the plurality of split results; and the screening unit is used for determining the screening result of the preliminary retrieval result based on the distance between two adjacent split results in the preliminary retrieval result in the plurality of split results for each preliminary retrieval result.
  12. 12. An electronic device comprising a processor and a memory, the memory storing instructions, the processor loading instructions from the memory to perform the steps of the information retrieval method according to any one of claims 1 to 10.
  13. 13. A computer readable storage medium storing instructions adapted to be loaded by a processor to perform the steps of the information retrieval method of any one of claims 1 to 10.
  14. 14. A computer program product comprising instructions which, when executed by a processor, implement the steps of the information retrieval method of any one of claims 1 to 10.

Description

Information retrieval method, device, electronic equipment and storage medium Technical Field The present application relates to the field of computers, and in particular, to an information retrieval method, an information retrieval device, an electronic device, and a storage medium. Background In the prior art, when constructing a database for information retrieval, each word in a text to be retrieved is often split in a sliding window with a fixed character length and stored in the database. When a user needs to retrieve an information text containing a word, the word or a part of the word is entered in a fixed character length byte to achieve the retrieval. For example, for the word apple, it is split into three basic units, "app", "ppl", "ple" and stored in the database. When the user needs to search the text carrying the word "apple", the user inputs any three continuous characters in "apple" or "apple", so that the text carrying the word "apple" can be searched. However, the word segmentation matching process described above has information redundancy, resulting in low retrieval efficiency. Disclosure of Invention The embodiment of the application provides an information retrieval method, an information retrieval device, electronic equipment and a storage medium, which can solve the problem of low retrieval efficiency in the prior art. The embodiment of the application provides an information retrieval method, which comprises the following steps: acquiring a character string to be processed; if the length of the character string to be processed exceeds a set length value, carrying out non-repeated splitting on the character string to be processed according to the set length value to obtain a plurality of splitting results, wherein the length of the character string to be processed is the number of characters contained in the character string to be processed, and the set length value is a positive integer greater than 1; searching based on the plurality of split results to obtain a plurality of preliminary search results, wherein each preliminary search result comprises the plurality of split results; and for each preliminary retrieval result, determining a screening result of the preliminary retrieval result based on the distance between two adjacent split results in the plurality of split results. An embodiment of the present application provides an information retrieval apparatus, including: a character string acquisition unit for acquiring character strings to be processed; The character string splitting unit is used for carrying out non-repeated splitting on the character string to be processed according to the set length value when the length of the character string to be processed exceeds the set length value to obtain a plurality of splitting results, wherein the length of the character string to be processed is the number of characters contained in the character string to be processed; The searching unit is used for searching based on the plurality of split results to obtain a plurality of preliminary search results, wherein each preliminary search result comprises the plurality of split results; and the screening unit is used for determining the screening result of the preliminary retrieval result based on the distance between two adjacent split results in the preliminary retrieval result in the plurality of split results for each preliminary retrieval result. In one embodiment, the screening unit is specifically configured to determine the preliminary search result as a final search result when a distance between a last two of the plurality of split results is a first distance value and a distance between any two adjacent split results except the last two split results is a second distance value. In one embodiment, the filtering unit is specifically configured to reject the preliminary search result when a distance between a last two of the plurality of split results is not a first distance value or a distance between any two adjacent split results other than the last two split results is not a second distance value. In one embodiment, the apparatus further comprises: And the second retrieval unit is used for retrieving based on the character string to be processed to obtain a retrieval result when the length of the character string to be processed does not exceed a set length value, wherein the retrieval result is a final retrieval result. In one embodiment, the apparatus further comprises: a termination bit determining unit, configured to determine, for a target word of the text to be retrieved, an index termination bit based on a word length of the target word, where the target word is any word of the text to be retrieved; an initial bit determining unit configured to determine a first bit of the target word as an index initial bit; An initial bit judging unit for judging whether the index initial bit exceeds the index ending bit; an initial bit assignment unit configured to assign a valu