CN-121996080-A - Word retrieval method, apparatus, device, computer readable storage medium and computer program product
Abstract
The application provides a word retrieval method, a device, equipment, a computer readable storage medium and a computer program product, wherein the method comprises the steps of responding to a word retrieval request aiming at an input character string, obtaining a double-dictionary tree, wherein the double-dictionary tree comprises a first dictionary tree and a second dictionary tree associated with the first dictionary tree, the first dictionary tree comprises a sound tree taking a simple spelling string of a full spelling string as a node, the second dictionary tree comprises a word tree taking Chinese characters as a node, determining a retrieval scene corresponding to the character string, and retrieving words corresponding to the character string from the double-dictionary tree by combining the sound tree and the word tree based on the retrieval scene. The application can improve the efficiency of word retrieval.
Inventors
- FEI TENG
Assignees
- 腾讯科技(深圳)有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20241105
Claims (15)
- 1. A method of word retrieval, the method comprising: In response to a word retrieval request for an input character string, acquiring a dual-dictionary tree, wherein the dual-dictionary tree comprises a first dictionary tree and a second dictionary tree associated with the first dictionary tree; The first dictionary tree comprises a sound tree taking a simple spelling string of the full spelling string as a node, and the second dictionary tree comprises a word tree taking Chinese characters as a node; Determining a retrieval scene corresponding to the character string; And based on the retrieval scene, combining the sound tree and the word tree, and retrieving words corresponding to the character strings from the double dictionary tree.
- 2. The method of claim 1, wherein the search scene comprises a full-pel search scene, the first dictionary tree further comprising a final region associated with the sound tree, a first node location region associated with the final region; Wherein the vowel area is used for storing vowels except the simple spelling string in the full spelling string, the first node position area is used for storing the node positions corresponding to the Chinese characters in the word tree; The step of searching words corresponding to the character strings from the double dictionary tree based on the search scene by combining the sound tree and the word tree comprises the following steps: Splitting the character string into a first simple spelling string and a first final, and searching the first simple spelling string layer by layer in the sound tree to obtain a first search result; Determining a target final matched with the full string in the final region based on the first search result and the first final; In the first node position area, determining the node position pointed by the target vowel; and searching the word corresponding to the character string in the word tree based on the node position pointed by the target vowel.
- 3. The method of claim 2, wherein the number of first strings is at least one, the first search result includes a last searched first string in the sound tree, and a node corresponding to the last searched first string is a first node; the determining, based on the first search result and the first final, a target final matched with the full string in the final area includes: Determining the vowels pointed by the first node, and determining the vowels pointed by the next brothers of the first node; Determining a final range of the first simple spelling string in the final region based on the final pointed by the first node and the final pointed by the next brother node of the first node; and determining the vowels matched with the first vowels in the vowel range of the first simple string in the vowel region, and taking the vowels matched with the first vowels as target vowels matched with the full string.
- 4. The method of claim 1, wherein the search scene comprises a spell search scene, the character string comprises at least one spell string, the first dictionary tree further comprises a first node location area for storing node locations corresponding to chinese characters in the word tree; The step of searching words corresponding to the character strings from the double dictionary tree based on the search scene by combining the sound tree and the word tree comprises the following steps: retrieving the at least one simple spelling string layer by layer in the sound tree, and taking the node corresponding to the finally retrieved simple spelling string as a second node; In the first node position area, determining the node position pointed by the second node, and determining the node position pointed by the next brother node of the second node; and searching the word corresponding to the character string in the word tree based on the node position pointed by the second node and the node position pointed by the next brother node of the second node.
- 5. The method of claim 4, wherein the searching the word tree for the word corresponding to the character string based on the node location pointed to by the second node and the node location pointed to by the next sibling node of the second node comprises: Determining a node position range in the first node position area by taking the node position pointed by the second node as a node initial position and the node position pointed by the next brother node of the second node as an end position; Searching the Chinese characters in the node position range in the word tree, and splicing the Chinese characters searched in the word tree to obtain the words corresponding to the character strings.
- 6. The method of claim 1, wherein the search scene comprises a word prediction scene, the first dictionary tree further comprises a first node location area for storing node locations corresponding to chinese characters in the word tree, the second dictionary tree further comprises a second node location area for storing node locations of word prediction results in the word tree; The step of searching words corresponding to the character strings from the double dictionary tree based on the search scene by combining the sound tree and the word tree comprises the following steps: determining a simple spelling string corresponding to the character string, and determining the node position pointed by the simple spelling string in the first node position area; Based on the node position pointed by the simple spelling string, searching corresponding Chinese characters in the word tree; determining the node position of the word prediction result pointed by the Chinese character in the second node position area; And searching the word corresponding to the character string in the word tree based on the node position of the word prediction result.
- 7. The method of claim 1, wherein the search scene comprises a word association scene, the character string comprises at least one chinese character, the second dictionary tree further comprises a second node location area for storing node locations of word prediction results in the word tree; The step of searching words corresponding to the character strings from the double dictionary tree based on the search scene by combining the sound tree and the word tree comprises the following steps: Searching the at least one Chinese character layer by layer in the word tree, and taking the node corresponding to the finally searched Chinese character as a third node; in the second node position area, determining the node position pointed by a third node, and determining the node position pointed by the next brother node of the third node; Determining a node position range in the second node position area by taking the node position pointed by a third node as a node initial position and the node position pointed by the next brother node of the third node as an end position; Searching the Chinese characters in the node position range in the word tree, and splicing the Chinese characters searched in the word tree to obtain the words corresponding to the character strings.
- 8. The method of claim 7, wherein searching the word tree for the chinese characters within the node position range, and concatenating the chinese characters found in the word tree to obtain the words corresponding to the character string, comprises: And aiming at each node position in the node position range, searching Chinese characters of corresponding nodes in the character tree, traversing layer by layer based on the father node pointed by the node until reaching the root node of the character tree, and splicing the Chinese characters obtained by layer to obtain words corresponding to the character strings.
- 9. The method of claim 1, wherein the search scene comprises a mixed scene, the character string comprises a first pinyin string and a chinese character, the first dictionary tree further comprises a final area associated with the pinyin tree, a first node location area associated with the final area; The step of searching words corresponding to the character strings from the double dictionary tree based on the search scene by combining the sound tree and the word tree comprises the following steps: searching the Chinese characters in the character string layer by layer in the character tree to obtain a second search result; Determining a second pinyin string corresponding to the Chinese character in the sound tree and the final region based on the second search result; combining the first pinyin string and the second pinyin string to obtain a pinyin string combination; And determining the node position corresponding to the pinyin combination in a first node position area of the first dictionary tree, and searching words corresponding to the pinyin combination in the word tree based on the node position.
- 10. The method of claim 1, wherein prior to the obtaining the dual dictionary tree, the method further comprises: Constructing the dual dictionary tree; Serializing the double dictionary tree and storing the serialization result of the double dictionary tree; The obtaining the dual dictionary tree includes: obtaining a serialization result of the binary classical tree; Compressing the serialization result of the double dictionary tree, and loading the compressed result.
- 11. The method according to any one of claims 1 to 10, wherein the number of words corresponding to the character string is plural, and the second dictionary tree further includes an attribute area for storing word frequencies of words; After the words corresponding to the character strings are retrieved from the dual-dictionary tree, the method further comprises: Determining word frequencies of words corresponding to the character strings from the attribute areas of the second dictionary tree respectively; And determining recommended words corresponding to the character strings from the words corresponding to the plurality of character strings based on the word frequency of each word.
- 12. A word retrieval device, the device comprising: The system comprises an acquisition module, a word retrieval module and a word retrieval module, wherein the acquisition module is used for responding to a word retrieval request aiming at an input character string and acquiring a double-dictionary tree, the double-dictionary tree comprises a first dictionary tree and a second dictionary tree associated with the first dictionary tree, the first dictionary tree comprises a sound tree taking a simple spelling string of a full spelling string as a node, and the second dictionary tree comprises a word tree taking Chinese characters as nodes; The scene determining module is used for determining a retrieval scene corresponding to the character string; and the retrieval module is used for retrieving words corresponding to the character strings from the double dictionary tree by combining the sound tree and the word tree based on the retrieval scene.
- 13. An electronic device, the electronic device comprising: A memory for storing computer executable instructions or computer programs; a processor for implementing the word retrieval method of any one of claims 1 to 11 when executing computer-executable instructions or computer programs stored in the memory.
- 14. A computer-readable storage medium storing computer-executable instructions or a computer program, which when executed by a processor implements the word retrieval method of any one of claims 1 to 11.
- 15. A computer program product comprising computer executable instructions or a computer program which, when executed by a processor, implements the word retrieval method of any one of claims 1 to 11.
Description
Word retrieval method, apparatus, device, computer readable storage medium and computer program product Technical Field The present application relates to computer technology, and in particular, to a word retrieval method, apparatus, device, computer readable storage medium, and computer program product. Background In the related art, because the word stock of the pinyin dictionary tree is independently stored with the word stock of the word dictionary tree, the word stock of the pinyin dictionary tree comprises the pinyin tree taking syllables as nodes, and also comprises word areas, and the word stock of the word dictionary tree comprises the word dictionary tree taking Chinese characters as nodes, and also comprises the pinyin areas, the problem of redundant storage is caused, so that a large amount of memory is wasted. Disclosure of Invention The embodiment of the application provides a word retrieval method, a word retrieval device, word retrieval equipment, a computer readable storage medium and a computer program product, which can improve word retrieval efficiency. The technical scheme of the embodiment of the application is realized as follows: The embodiment of the application provides a word retrieval method, which comprises the following steps: In response to a word retrieval request for an input character string, acquiring a dual-dictionary tree, wherein the dual-dictionary tree comprises a first dictionary tree and a second dictionary tree associated with the first dictionary tree; The first dictionary tree comprises a sound tree taking a simple spelling string of the full spelling string as a node, and the second dictionary tree comprises a word tree taking Chinese characters as a node; Determining a retrieval scene corresponding to the character string; And based on the retrieval scene, combining the sound tree and the word tree, and retrieving words corresponding to the character strings from the double dictionary tree. The embodiment of the application provides a word retrieval device, wherein a mixed expert model comprises a gating network and a plurality of expert models, and the device comprises: The system comprises an acquisition module, a word retrieval module and a word retrieval module, wherein the acquisition module is used for responding to a word retrieval request aiming at an input character string and acquiring a double-dictionary tree, the double-dictionary tree comprises a first dictionary tree and a second dictionary tree associated with the first dictionary tree, the first dictionary tree comprises a sound tree taking a simple spelling string of a full spelling string as a node, and the second dictionary tree comprises a word tree taking Chinese characters as nodes; The scene determining module is used for determining a retrieval scene corresponding to the character string; and the retrieval module is used for retrieving words corresponding to the character strings from the double dictionary tree by combining the sound tree and the word tree based on the retrieval scene. In the scheme, the search scene comprises a full spelling search scene, the first dictionary tree further comprises a vowel region associated with the sound tree and a first node position region associated with the vowel region, wherein the vowel region is used for storing vowels except for the simple spelling strings in the full spelling strings, the first node position region is used for storing node positions corresponding to Chinese characters in the character tree, the search module is further used for splitting the character string into the first simple spelling strings and the first vowels, searching the first simple spelling strings layer by layer in the sound tree to obtain a first search result, determining target vowels matched with the full spelling strings in the vowel region based on the first search result and the first vowels, determining the node positions pointed by the target vowels in the first node position region, and searching words corresponding to the character string in the character tree based on the node positions pointed by the target vowels. In the above scheme, the number of the first simple spelling strings is at least one, the first search result includes a first simple spelling string which is searched last in the sound tree, a node corresponding to the last searched first simple spelling string is a first node, the search module is further configured to determine a final pointed by the first node, determine a final pointed by a next sibling node of the first node, determine a final range of the first simple spelling string in the final region based on the final pointed by the first node and the final pointed by the next sibling node of the first node, and determine a final matched with the first final in the final range of the first simple spelling string in the final region, and use the final matched with the first spelling string as a target final matched with the full