Search

CN-122021609-A - Address resolution method, address resolution device, electronic device, storage medium, and program product

CN122021609ACN 122021609 ACN122021609 ACN 122021609ACN-122021609-A

Abstract

The embodiment of the application provides an address resolution method, an address resolution device, electronic equipment, a storage medium and a program product. The method comprises the steps of obtaining an original address text, segmenting the original address text to obtain a plurality of address fragments, carrying out address analysis processing on the address fragments to obtain at least one group of first candidate address elements, determining target address elements corresponding to the original address text according to at least one group of first candidate address elements if confidence scores of the at least one group of first candidate address elements meet a preset confidence score threshold value, and carrying out enhancement analysis processing on the basis of the address fragments and the original address text to obtain the target address elements corresponding to the original address text if confidence scores of all the first candidate address elements do not meet the preset confidence score threshold value. The method is used for improving the accuracy of address resolution, enhancing the robustness to nonstandard and noise addresses, and further improving the automation efficiency and reliability of business transaction auditing.

Inventors

  • SUN JIANYE
  • XU WEIBO
  • LI HANG
  • LIU LIANGCHEN

Assignees

  • 中国银联股份有限公司

Dates

Publication Date
20260512
Application Date
20251229

Claims (20)

  1. 1. An address resolution method, comprising: acquiring an original address text, and segmenting the original address text to obtain a plurality of address fragments; performing address resolution processing on the plurality of address fragments to obtain at least one group of first candidate address elements, wherein each group of first candidate address elements comprises a plurality of address labels and address fragments corresponding to the address labels; If the confidence coefficient score of at least one group of first candidate address elements meets a preset confidence coefficient score threshold value, determining a target address element corresponding to the original address text according to the at least one group of first candidate address elements; If the confidence scores of all the first candidate address elements do not meet a preset confidence score threshold, performing enhancement analysis processing based on the address fragments and the original address text to obtain target address elements corresponding to the original address text; the target address element is used for conducting business transaction auditing on the original address text.
  2. 2. The method of claim 1, wherein segmenting the original address text to obtain a plurality of address fragments comprises: Based on a preset word segmentation tool, word segmentation processing is carried out on the original address text to obtain a text sequence containing a plurality of vocabularies; And carrying out sliding window enumeration on the text sequence based on a preset combination length threshold value N, and generating a plurality of address fragments, wherein N is an integer greater than 1.
  3. 3. The method of claim 2, wherein sliding window enumeration is performed on the text sequence based on a preset combination length N to generate a plurality of address fragments, including: Sequentially sliding from the starting position of the text sequence by taking the combined length threshold N as the maximum size of a sliding window; for any sliding, merging continuous M vocabularies covered by the current sliding window into an address fragment, wherein M is a variable integer, and N is not less than 1<M; and traversing the text sequence until the starting position of the sliding window covers the tail of the text sequence, and obtaining the address fragments.
  4. 4. The method of claim 1, wherein performing an address resolution process on the plurality of address fragments to obtain at least one set of first candidate address elements comprises: Acquiring a predefined address tag set, wherein each address tag in the address tag set is defined according to an address hierarchy rule; for any address tag in the address tag set, determining at least one candidate address fragment corresponding to the address tag from the plurality of address fragments; Constructing a directed graph according to each address label and a corresponding candidate address fragment thereof in the address label set, wherein the graph nodes of the directed graph represent one address label and a corresponding candidate address fragment thereof, the directed edges represent that the address fragments represented by the two graph nodes are adjacent in sequence in the original address text, and the corresponding address labels thereof allow the adjacent in a predefined address label address structure rule; And searching paths based on the directed graph to obtain at least one group of first candidate address elements, wherein each group of first candidate address elements corresponds to a path from the starting address label graph node to the ending address label graph node.
  5. 5. The method of claim 4, wherein determining at least one candidate address fragment corresponding to the address tag from the plurality of address fragments comprises: Calculating the similarity between the address label and each address fragment; and determining the first K address fragments with highest similarity as candidate address fragments corresponding to the address labels based on the similarity.
  6. 6. The method of claim 4, wherein constructing a directed graph from each address tag in the set of address tags and its respective corresponding candidate address fragment comprises: combining each address label with each candidate address fragment corresponding to each address label to obtain a graph node; If the address fragment corresponding to the current graph node and the address fragment corresponding to the next graph node are sequentially adjacent in the original address text, and the address label corresponding to the current graph node and the address label corresponding to the next graph node are allowed to be adjacent in a predefined address label address structure rule, establishing a directed edge between the current graph node and the next graph node; And constructing the directed graph based on all the created graph nodes and the created directed edges.
  7. 7. The method of claim 6, wherein performing a path search based on the directed graph results in at least one set of first candidate address elements, comprising: searching at least one path from the graph node corresponding to the predefined starting address label to the graph node corresponding to the predefined ending address label, and calculating the comprehensive score of each path; And selecting at least one path from all the searched paths according to the comprehensive score to obtain at least one group of first candidate address elements.
  8. 8. The method of claim 7, wherein calculating the composite score for each path comprises: obtaining node weights of all graph nodes on the path, and summing the node weights to obtain a node weight sum, wherein the node weights represent the matching degree between the address labels and the candidate address fragments; obtaining transfer weights of all directed edges on the path, and summing the transfer weights to obtain a transfer weight sum, wherein the transfer weights are used for quantifying rationality and common degree of address structure rules represented by edges which are established in the directed graph and are connected with two different address labels; the composite score is calculated based on the node weight sum and the transfer weight sum.
  9. 9. The method of claim 8, wherein obtaining the transfer weights for all directed edges on the path comprises: inquiring a predefined transfer weight mapping table according to the edge identifiers corresponding to all the directed edges on the path, and acquiring corresponding transfer weights; wherein the transfer weight mapping table is constructed based on at least one of: According to the address hierarchy specification and the common degree, different weights are distributed for address structure rules conforming to the predefined address labels; Based on the statistical probability of the historical address data conforming to the predefined address tag address structure rule, corresponding transfer weights are calculated and distributed.
  10. 10. The method of claim 7, wherein selecting at least one path from among all paths searched based on the composite score, results in at least one set of first candidate address elements, comprising: Based on the comprehensive scores of the paths, selecting the top P paths with the highest comprehensive scores as candidate paths; And for each candidate path, determining an address label and a candidate address fragment corresponding to the graph node sequence contained in the candidate path as a corresponding group of first candidate address elements.
  11. 11. The method of claim 1, wherein after obtaining at least one set of first candidate address elements, the method further comprises: Acquiring a pre-constructed first cue word template, wherein the first cue word template comprises preset slots for receiving the original address text and the address fragments so as to guide the confidence score calculation; Filling the original address text and the address fragments into the preset slots of the first prompting word template to obtain confidence score calculation prompting words; And calculating a prompt word based on the confidence score, and calling a large language model to perform confidence score calculation processing to obtain the confidence score of the first candidate address element.
  12. 12. The method of claim 1, wherein performing enhancement parsing processing based on the address fragment and the original address text to obtain a target address element corresponding to the original address text comprises: Obtaining a pre-constructed second prompting word template, wherein the second prompting word template comprises preset slots for receiving the original address text and the address fragments so as to guide the generation of address elements; Filling the original address text and the plurality of address fragments into the preset slots of the second prompting word template to obtain address element generating prompting words; And generating a prompt word based on the address element, and calling a large language model to perform address resolution processing to obtain the target address element.
  13. 13. The method of claim 12, wherein invoking the large language model for address resolution processing results in the target address element, comprising: acquiring a second candidate address element output by the large language model and a confidence score corresponding to the second candidate address element; if the confidence score meets a preset confidence score threshold, determining the second candidate address element as a target address element; And if the confidence score meets a preset confidence score threshold, determining a target address element according to the first candidate address element and the second candidate address element.
  14. 14. The method of any of claims 1-13, wherein prior to slicing the original address text, the method further comprises: carrying out format normalization processing on the original address text; performing text compliance verification on the address text subjected to format normalization processing based on a preset text verification rule; If the address text passes the text compliance verification, executing the subsequent segmentation step; and if the address text does not pass the text compliance verification, marking the address text as an abnormal address and ending the processing flow.
  15. 15. The method according to any one of claims 1-13, wherein after obtaining the target address element corresponding to the original address text, the method further comprises: based on a preset service verification rule, carrying out service compliance verification on the target address element; If the target address element passes the service compliance verification, performing service transaction checking on the original address text according to the target address element; And if the target address element does not pass the service compliance verification, generating and outputting a verification result containing the risk mark.
  16. 16. The method of claim 15, wherein conducting business transaction audits on the original address text based on the target address elements, comprises: outputting the target address element and the original address text to a preset upper-layer service system to trigger the upper-layer service system to conduct service transaction auditing on the original address text based on the target address element.
  17. 17. An address resolution apparatus, comprising: The address fragment obtaining module is used for obtaining an original address text and segmenting the original address text to obtain a plurality of address fragments; The first candidate address element obtaining module is used for carrying out address analysis processing on the plurality of address fragments to obtain at least one group of first candidate address elements, wherein each group of first candidate address elements comprises a plurality of address labels and address fragments corresponding to the address labels; The first target address element determining module is used for determining a target address element corresponding to the original address text according to at least one group of first candidate address elements if the confidence score of any group of first candidate address elements meets a preset confidence score threshold; And the second target address element determining module is used for performing enhanced analysis processing based on the address fragment and the original address text to obtain a target address element corresponding to the original address text if the confidence scores of the at least one group of first candidate address elements do not meet a preset confidence score threshold value.
  18. 18. An electronic device is characterized by comprising a memory and a processor; The memory stores computer-executable instructions; the processor executing computer-executable instructions stored in the memory, causing the processor to perform the method of any one of claims 1-16.
  19. 19. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of any one of claims 1-16.
  20. 20. A computer program product comprising a computer program which, when executed by a processor, implements the method of any of claims 1-16.

Description

Address resolution method, address resolution device, electronic device, storage medium, and program product Technical Field The present application relates to the field of artificial intelligence, and in particular, to an address resolution method, an apparatus, an electronic device, a storage medium, and a program product. Background In the fields of finance, logistics, government affairs and the like requiring address information standardization, address analysis is carried out on related merchants, and business handling and auditing are key links for ensuring business safety and data quality according to analysis results. For example, in a business network access auditing scenario, it is required to verify whether the address elements obtained by analysis meet the national administrative division specifications, and whether sensitive words and redundant information are included, so as to prevent risks and ensure data availability. Currently, related art mainly implements address resolution based on rule matching automation systems. The traditional automatic system mostly adopts keyword matching, regular expression or static address library comparison and the like to realize basic analysis and noise filtration of address elements. However, the related art is difficult to effectively process complex writing, irregular expression, new place names and malicious filling content in the address text, and analysis omission is easy to cause. In addition, the related technology cannot accurately identify and reject irrelevant noise at the semantic level, influence analysis accuracy and reliability, and are difficult to adapt to complex and changeable actual business scenes. Disclosure of Invention The embodiment of the application provides an address resolution method, an address resolution device, electronic equipment, a storage medium and a program product, which are used for improving the accuracy of address resolution and enhancing the robustness to nonstandard and noise addresses, thereby improving the automation efficiency and reliability of business transaction auditing. In a first aspect, an embodiment of the present application provides an address resolution method, including: acquiring an original address text, and segmenting the original address text to obtain a plurality of address fragments; Performing address resolution processing on the plurality of address fragments to obtain at least one group of first candidate address elements, wherein each group of first candidate address elements comprises a plurality of address labels and address fragments corresponding to the address labels; If the confidence coefficient score of at least one group of first candidate address elements meets a preset confidence coefficient score threshold value, determining a target address element corresponding to the original address text according to the at least one group of first candidate address elements; If the confidence scores of all the first candidate address elements do not meet the preset confidence score threshold value, performing enhancement analysis processing based on the address fragments and the original address text to obtain target address elements corresponding to the original address text; the target address element is used for conducting business transaction auditing on the original address text. In one possible implementation manner, the segmentation is performed on the original address text to obtain a plurality of address fragments, including: Based on a preset word segmentation tool, carrying out word segmentation processing on the original address text to obtain a text sequence containing a plurality of words; and carrying out sliding window enumeration on the text sequence based on a preset combination length threshold value N, and generating a plurality of address fragments, wherein N is an integer greater than 1. In a possible implementation manner, based on a preset combination length N, sliding window enumeration is performed on a text sequence, and a plurality of address fragments are generated, including: Sequentially sliding from the initial position of the text sequence by taking the combined length threshold value N as the maximum size of the sliding window; for any sliding, merging continuous M vocabularies covered by the current sliding window into an address fragment, wherein M is a variable integer, and N is not less than 1<M; And traversing the text sequence until the starting position of the sliding window covers the tail of the text sequence, and obtaining a plurality of address fragments. In one possible implementation manner, address resolution processing is performed on the plurality of address fragments to obtain at least one set of first candidate address elements, including: Acquiring a predefined address tag set, wherein each address tag in the address tag set is defined according to an address hierarchy rule; For any address tag in the address tag set, determining at lea