CN-122019574-A - Intelligent place name address data matching and searching method based on large language model

CN122019574ACN 122019574 ACN122019574 ACN 122019574ACN-122019574-A

Abstract

The invention relates to the technical field of large language models, and discloses an intelligent place name address data matching method based on a large language model, which comprises the steps of collecting place name address related data and corresponding space graphic data, and carrying out fusion association processing on the place name address related data and the corresponding space graphic data to obtain processed information; the method comprises the steps of converting processed information through textualization and vectorization to form a place name address knowledge base adapting to a large language model, obtaining a place name address matching model with address level matching result evaluation based on the place name address knowledge base and the large language model enhancement retrieval and prompt word fine adjustment technology, receiving a place name address text to be matched, retrieving based on the place name address knowledge base after preprocessing to obtain a candidate matching result set, calling the place name address matching model to conduct level evaluation on the candidate matching result set, and outputting an optimal matching result and a matching level thereof.

Inventors

WANG FANGMIN
WANG JUN
LIU YING
CHEN LIN
DENG YINYIN
WU GUOLIANG
NIE XIAOTONG
LIU KANGNING
DONG WENJIE
ZHOU HONGWEN
LIANG XING
GAO XIANG
CHEN JIAQUAN
YANG MENGHAN
Ao Xiaojing

Assignees

重庆市地理信息和遥感应用中心(重庆市测绘产品质量检验测试中心)

Dates

Publication Date: 20260512
Application Date: 20260413

Claims (8)

1. The intelligent place name address data matching method based on the large language model is characterized by comprising the following steps of: Collecting place name address related data and corresponding space graphic data, and carrying out fusion association processing on the place name address related data and the corresponding space graphic data to obtain processed information; converting the processed information through textualization and vectorization to form a place name address knowledge base adapting to a large language model; obtaining a place name address matching model with address level matching result evaluation based on the place name address knowledge base and a large language model enhanced retrieval and prompt word fine adjustment technology; receiving a place name address text to be matched, and searching based on the place name address knowledge base after preprocessing to obtain a candidate matching result set; And calling the place name address matching model to carry out level evaluation on the candidate matching result set, and outputting an optimal matching result and a matching level thereof.
2. The method for matching location name address data based on large language model according to claim 1, wherein the steps of collecting location name address related data and corresponding space graphic data, and performing fusion association processing on the location name address related data and corresponding space graphic data to obtain processed information, and comprising: The automatic collection of the related data of the multi-source place name address is realized based on the Requests module of Python and BeautifulSoup, and the collected data is stored by using a PostgreSQL database; the place name address related data comprise standard addresses, non-standard addresses, azimuth descriptions, place names, aliases, historical names and place name detail information, and the space graphic data comprise geographic coordinate data, regional boundary data and road center line data; the fusion association processing is carried out on the place name address related data and the corresponding space graphic data, and the method specifically comprises the following steps: The method comprises the steps of constructing a multidimensional entity detail model, defining a data structure of place name entity detail information, wherein the data structure comprises general attributes, special attributes and space attributes, the general attributes comprise entity unique identifiers, entity categories and management unit codes, the special attributes comprise POI attributes associated with place names, and the space attributes comprise geographic coordinate points, regional boundaries or polygon data corresponding to the entities; Aligning standard addresses based on semantic and spatial double constraint; combining the adjacent distance between the geographical coordinates of the place name data and the standard address space position, and carrying out secondary screening on the candidate matching set by utilizing space distance constraint to determine the associated standard address ID, thereby realizing the hooking of the place name and the standard address; Resolving the matched standard address into 11 layers of structured components of province, city, district or county, street or town or country, community or village or road street, village group or sheet area, house number, building house number, unit house number and house number, establishing mapping rules of the address components of each level and space graphic data, namely associating the management unit address components with regional boundary polygon data, associating the road street components with road center line data, associating the point house number components with geographic coordinate point data, and generating a fine-grained index table of the text component-space entity; generating and fusing the non-standard address and the azimuth description based on the rule, generating an effective non-standard address through the atomic operation of hierarchical elimination and suffix trimming based on the structural component generated in the step three, generating a standardized azimuth description in a space adjacent domain of the standard address through calculating the azimuth and distance relation between the standard address and a road intersection, and fusing the generated non-standard address and azimuth description into a place name address data system.
3. The method for matching location name address data based on large language model according to claim 1, wherein the transforming the processed information into text and vector forms a location name address knowledge base adapting to large language model, comprising: filling the integrated and associated structured data according to a preset template to generate an instantiation knowledge document; carrying out semantic segmentation on the instantiated knowledge document based on FastGPT to obtain regular text units, encoding the text units into high-dimensional vectors, storing the high-dimensional vectors into a vector database, and establishing approximate nearest neighbor indexes; and evaluating the retrieval accuracy by using FastGPT to prompt a fine tuning framework, and optimizing a knowledge base to obtain a place name address knowledge base adapting to the large language model.
4. The method for matching location name address data based on large language model according to claim 1, wherein the location name address matching model with address level matching result evaluation is obtained based on the location name address knowledge base, large language model enhanced retrieval and prompt word fine tuning technology, and comprises the following steps: Defining a plurality of matching levels according to 11 layers of structures of provinces, cities, areas or counties, streets or towns or villages, communities or villages or road streets, village groups or sheet areas, house numbers, building house numbers, unit house numbers and house numbers of standard addresses; based on the large language model, combining the enhanced retrieval and the prompt word fine adjustment technology, the place name and address matching model capable of outputting a matching level is obtained.
5. The method for matching location name address data based on large language model according to claim 1, wherein said receiving location name address text to be matched, after preprocessing, searching based on the knowledge base, and obtaining a candidate matching result set comprises: cleaning, analyzing and managing the address text of the place name to be matched and carrying out complete pretreatment; invoking a search engine to perform preliminary search, and directly outputting a result if the preliminary search result meets a direct matching condition; If the preliminary search does not meet the preset matching requirement, the preprocessed text to be matched is encoded into a query vector, the approximate nearest neighbor search is carried out by calculating the cosine similarity between the query vector and the high-dimensional vector in the knowledge base, and the most relevant Top-K result is screened out as a candidate matching result set.
6. The large language model based intelligent place name address data matching method according to claim 5, wherein the preprocessing of cleaning, analyzing and managing unit complement for the place name address text to be matched comprises: The method comprises the steps of cleaning special characters in a text to be matched by using a regular expression, and performing word segmentation on the cleaned address text based on a word segmentation tool of a custom dictionary to obtain an address word segmentation unit; constructing a management unit knowledge graph comprising multistage management unit nodes, node attributes and upper and lower level membership relations among the nodes; Carrying out entity link on the address word segmentation unit of the text to be matched in the management unit knowledge graph, and identifying the corresponding management unit node and the hierarchy; and according to the hierarchical topological relation of the identified nodes, complementing the missing hierarchical information of the management unit by a constraint reasoning mode of top-down complementation or bottom-up backtracking to obtain the complete address text of the management unit.
7. The intelligent place name address data retrieval method based on the large language model is characterized by comprising the following steps of: A place name address knowledge base constructed by adopting the large language model-based intelligent place name address data matching method as claimed in any one of claims 1 to 6; Receiving a natural language query statement input by a user; Performing semantic expansion on the query statement based on a large language model prompt word fine adjustment technology; Encoding the expanded query sentence into a query vector, performing approximate nearest neighbor search by calculating cosine similarity between the query vector and a high-dimensional vector in a knowledge base, and screening out Top-K most relevant candidate results; And packaging the association attribute of the candidate result into a structured place name address knowledge card according to a preset knowledge expression template and returning.
8. The method for intelligently retrieving place name address data based on a large language model according to claim 7, further comprising a knowledge base dynamic update mechanism, comprising: Collecting feedback information of a user on a search result; Adopting FastGPT prompting fine tuning framework to intelligently audit the feedback information, and screening to obtain high-quality feedback information; and taking the high-quality feedback information which passes the intelligent auditing as a newly added alias or nonstandard expression, and incrementally updating the newly added alias or nonstandard expression to the place name address knowledge base to realize the dynamic optimization of the knowledge base.

Description

Intelligent place name address data matching and searching method based on large language model Technical Field The invention relates to the technical field of large language models, in particular to an intelligent place name address data matching and retrieving method based on a large language model. Background The place name address data is used as a core space basic element of the digital city, provides a unified space description framework for various entities (such as resident household, legal places, public facilities and the like) of the city, and is a key tie for realizing the association and fusion of city treatment elements. However, the technical bottlenecks of difficult data fusion, lagged updating mechanism and single retrieval mode are still faced in the practical application. The method is characterized in that the place name address data from different sources have differences in naming standards and description granularity, an accurate association relation is difficult to establish, the existing search engine is matched by multiple dependent keywords, semantic understanding and intelligent reasoning capability is lacked, the method cannot meet the requirements of the AI era on intelligent retrieval, and the traditional manual collection and batch storage mode is difficult to support the dynamic update requirement of the place name address. Currently, although technical application of place name address data in two aspects of matching and searching has advanced, obvious defects still exist. In terms of data matching, the existing method mainly depends on several technologies such as character string similarity calculation, rule and address standardization, machine learning/deep learning and the like. Although the method has advantages, the method generally has the following problems that the compatibility of non-standard information such as historical names, azimuth descriptions and the like is poor, and meanwhile, a unified and effective result quality assessment mechanism is lacking, so that the credibility of a matching result is difficult to objectively measure. In the aspect of retrieval, the prior art mostly adopts prefix matching, a ranking method based on context and user portraits, a knowledge graph and semantic expansion or statistical learning and popularity. Although these methods are improved in terms of response speed, personalized experience or result diversity, the bottom layer is still difficult to truly understand the semantic intention of the user, and further improvement of the retrieval effect is limited. Disclosure of Invention Aiming at the defects existing in the prior art, the invention provides an intelligent place name address data matching and searching method based on a large language model, which is used for solving the technical problems. In a first aspect, there is provided a large language model-based intelligent place name address data matching method, which is characterized by comprising: Collecting place name address related data and corresponding space graphic data, and carrying out fusion association processing on the place name address related data and the corresponding space graphic data to obtain processed information; converting the processed information through textualization and vectorization to form a place name address knowledge base adapting to a large language model; obtaining a place name address matching model with address level matching result evaluation based on the place name address knowledge base and a large language model enhanced retrieval and prompt word fine adjustment technology; receiving a place name address text to be matched, and searching based on the place name address knowledge base after preprocessing to obtain a candidate matching result set; And calling the place name address matching model to carry out level evaluation on the candidate matching result set, and outputting an optimal matching result and a matching level thereof. Further, the collecting the place name address related data and the corresponding space graphic data, and performing fusion association processing on the place name address related data and the corresponding space graphic data to obtain processed information, including: The automatic collection of the related data of the multi-source place name address is realized based on the Requests module of Python and BeautifulSoup, and the collected data is stored by using a PostgreSQL database; the place name address related data comprise standard addresses, non-standard addresses, azimuth descriptions, place names, aliases, historical names and place name detail information, and the space graphic data comprise geographic coordinate data, regional boundary data and road center line data; the fusion association processing is carried out on the place name address related data and the corresponding space graphic data, and the method specifically comprises the following steps: The method comprises the steps of co