CN-115905582-B - Abnormal POI data detection method and device, electronic equipment and storage medium
Abstract
The disclosure provides a detection method, a detection device, electronic equipment and a storage medium for abnormal POI data, relates to the field of data processing, and particularly relates to the field of map data processing. Aiming at POIs to be detected in a map, segmenting the building information in a database and the building information in the map to obtain a first segmentation result and a second segmentation result, matching first words in the first segmentation result with second words in the second segmentation result to obtain binarization matching results of the first words, and determining that the building information in the map is abnormal if the binarization matching results are the first words with second preset values. If the first words with the binarization matching results being the second preset values exist, the fact that the first word segmentation results are different from the second word segmentation results indicates that building information corresponding to the POI in the database is different from building information corresponding to the POI in the map, abnormal POI data in the map can be determined, and abnormal POI data can be effectively detected.
Inventors
- LONG PAN
Assignees
- 北京百度网讯科技有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20221118
Claims (13)
- 1. A method of detecting abnormal POI data, comprising: aiming at POIs to be detected in a map, acquiring reference building information corresponding to the POIs in a database and acquiring hanging building information corresponding to the POIs in the map, wherein the building information comprises building names and/or building addresses; The method comprises the steps of matching words in a preset basic word list with reference building information and hanging building information one by one to obtain a first candidate word segmentation and a second candidate word segmentation, wherein the preset basic word list comprises phrases preset for names or addresses of various buildings, and the first candidate word segmentation and the second candidate word segmentation respectively comprise words which are successfully matched with the reference building information and the hanging building information in the preset basic word list and words which are not successfully matched with the reference building information and the hanging building information in the preset basic word list; Matching the first candidate word segmentation with the second candidate word segmentation with words in a preset stop word list; Removing words successfully matched with words in a preset stop word list in the first candidate word segmentation and the second candidate word segmentation to obtain a first word segmentation result and a second word segmentation result, wherein the first word segmentation result is a word segmentation result with a large number of words contained in the word segmentation results of the reference building information and the hanging building information; Matching each first word in the first word segmentation result with each second word in the second word segmentation result to obtain a binarization matching result of each first word, wherein the binarization matching result comprises a first preset value and a second preset value, the first preset value indicates that the first word and the second word are successfully matched, and the second preset value indicates that the first word and the second word are not successfully matched; If an abnormal first word exists in the first word segmentation result, determining that the hanging building information is abnormal POI data, deleting the hanging building information corresponding to the POI in the map, and recording a hanging change record table in a database, wherein the abnormal first word is a first word with a binarization matching result being a second preset value.
- 2. The method of claim 1, wherein the matching the first word with each second word in the second word segmentation result for each first word in the first word segmentation result to obtain a binary matching result of each first word, includes: acquiring synonyms of all second words in the second word segmentation result based on a preset synonym table; And matching the first word with each second word and synonyms of each second word aiming at each first word in the first word segmentation result to obtain a binarization matching result of each first word, wherein the binarization matching result comprises a first preset value and a second preset value, the first preset value represents that the first word is identical with the second word or the synonyms of the second word, and the second preset value represents that the first word is not identical with the synonyms of the second word and the second word.
- 3. The method of claim 1, wherein the matching the first word with each second word in the second word segmentation result for each first word in the first word segmentation result to obtain a binary matching result of each first word, includes: Acquiring a first word from the first word segmentation result; for the first word, acquiring a second word from the second word segmentation result; matching the second word with the first word; if the second word is successfully matched with the first word, a binarization matching result of the first word and the second word is determined to be a first preset value, and if the second word is not successfully matched with the first word, a binarization matching result of the first word and the second word is determined to be a second preset value; Acquiring a new second word from the second word segmentation result, and returning to the step of executing the matching of the second word and the first word until the second word and the first word are successfully matched, or acquiring all second words in the second word segmentation result aiming at the first word; and acquiring a new first word from the first word segmentation result, and returning to the step of executing the second word acquisition from the second word segmentation result aiming at the first word until all the first words in the first word segmentation result are acquired.
- 4. The method of claim 3, wherein the obtaining, for the first term, a second term from the second term result comprises: for the first word, obtaining second words from the words to be matched of the second word segmentation result, wherein the words to be matched are all the second words in the second word segmentation result at the beginning; The method further comprises the steps of: if the second word is successfully matched with the first word, deleting the second word from the words to be matched; The step of obtaining a new second word from the second word segmentation result and returning to the step of performing the matching of the second word with the first word until all second words in the second word segmentation result for the first word are obtained includes: and acquiring a new second word from the words to be matched of the second word segmentation result, and returning to the step of executing the matching of the second word and the first word until all second words in the words to be matched of the second word segmentation result aiming at the first word are acquired.
- 5. The method of claim 1, the method further comprising: and if the binarization matching result of each first word in the first word segmentation result comprises the first preset value, determining that the hanging building information is normal POI data.
- 6. A detection apparatus for abnormal POI data, comprising: The system comprises an acquisition module, a storage module and a display module, wherein the acquisition module is used for acquiring reference building information corresponding to a POI in a database and acquiring hanging building information corresponding to the POI in the map aiming at the POI to be detected; The word segmentation module is used for matching words in a preset basic word list with the reference building information and the hanging building information one by one to obtain a first candidate word segmentation and a second candidate word segmentation; the method comprises the steps of carrying out word segmentation on a first candidate word segment, a second candidate word segment and a preset stop word list, removing words which are successfully matched with words in the preset stop word list in the first candidate word segment and the second candidate word segment, and obtaining a first word segment result and a second word segment result, wherein the first word segment result is a word segment result with a large number of words contained in the word segment results of the reference building information and the hanging building information, the second word segment result is a word segment result with a small number of words contained in the word segment results of the reference building information and the hanging building information, the preset basic word list comprises word groups preset for names or addresses of all buildings, and the words which are successfully matched with the reference building information and the hanging building information in the preset basic word list and the words which are not successfully matched with the preset basic word list in the reference building information and the hanging building information; The matching module is used for matching each first word in the first word segmentation result with each second word in the second word segmentation result to obtain a binarization matching result of each first word, wherein the binarization matching result comprises a first preset value and a second preset value, the first preset value indicates that the first word and the second word are successfully matched, and the second preset value indicates that the first word and the second word are not successfully matched; The detection module is used for determining that the hanging building information is abnormal POI data if the abnormal first word exists in the first word segmentation result, deleting the hanging building information corresponding to the POI in the map, and recording a hanging change record table in the database, wherein the abnormal first word is a first word with a second preset value as the binarization matching result.
- 7. The apparatus of claim 6, wherein the matching the first word with each second word in the second word segmentation result for each first word in the first word segmentation result to obtain a binary matching result for each first word, comprises: acquiring synonyms of all second words in the second word segmentation result based on a preset synonym table; And matching the first word with each second word and synonyms of each second word aiming at each first word in the first word segmentation result to obtain a binarization matching result of each first word, wherein the binarization matching result comprises a first preset value and a second preset value, the first preset value represents that the first word is identical with the second word or the synonyms of the second word, and the second preset value represents that the first word is not identical with the synonyms of the second word and the second word.
- 8. The apparatus of claim 6, wherein the matching the first word with each second word in the second word segmentation result for each first word in the first word segmentation result to obtain a binary matching result for each first word, comprises: Acquiring a first word from the first word segmentation result; for the first word, acquiring a second word from the second word segmentation result; matching the second word with the first word; if the second word is successfully matched with the first word, a binarization matching result of the first word and the second word is determined to be a first preset value, and if the second word is not successfully matched with the first word, a binarization matching result of the first word and the second word is determined to be a second preset value; Acquiring a new second word from the second word segmentation result, and returning to the step of executing the matching of the second word and the first word until the second word and the first word are successfully matched, or acquiring all second words in the second word segmentation result aiming at the first word; and acquiring a new first word from the first word segmentation result, and returning to the step of executing the second word acquisition from the second word segmentation result aiming at the first word until all the first words in the first word segmentation result are acquired.
- 9. The apparatus of claim 8, wherein the obtaining, for the first term, a second term from the second term result comprises: for the first word, obtaining second words from the words to be matched of the second word segmentation result, wherein the words to be matched are all the second words in the second word segmentation result at the beginning; the matching module is used for deleting the second word from the word to be matched if the second word is successfully matched with the first word; The step of obtaining a new second word from the second word segmentation result and returning to the step of performing the matching of the second word with the first word until all second words in the second word segmentation result for the first word are obtained includes: and acquiring a new second word from the words to be matched of the second word segmentation result, and returning to the step of executing the matching of the second word and the first word until all second words in the words to be matched of the second word segmentation result aiming at the first word are acquired.
- 10. The apparatus of claim 6, wherein the detection module is configured to determine that the hitching building information is normal POI data if the binarized matching result of each first word in the first word segmentation result includes the first preset value.
- 11. An electronic device, comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
- 12. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.
- 13. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-5.
Description
Abnormal POI data detection method and device, electronic equipment and storage medium Technical Field The present disclosure relates to the field of data processing technologies, and in particular, to the field of map data processing technologies. Background Due to map-side data hooking errors, the problem of incorrect display information of POIs (Point of Interest, points of interest) in the map may occur. For example, information of multiple hotels is displayed on the same POI, and the information of the same hotel is displayed at multiple POI points and the like. Disclosure of Invention The disclosure provides a detection method, device and equipment of abnormal POI data and a storage medium, which are used for detecting the abnormal POI data. According to an aspect of the present disclosure, there is provided a method for detecting abnormal POI data, including: Aiming at POIs to be detected in a map, acquiring reference building information corresponding to the POIs in a database and acquiring hanging building information corresponding to the POIs in the map aiming at all POIs in the map; The method comprises the steps of respectively carrying out word segmentation on the reference building information and the hanging building information to obtain a first word segmentation result and a second word segmentation result, wherein the first word segmentation result is a word segmentation result of one of the reference building information and the hanging building information, and the second word segmentation result is a word segmentation result of the other of the reference building information and the hanging building information; Matching each first word in the first word segmentation result with each second word in the second word segmentation result to obtain a binarization matching result of each first word, wherein the binarization matching result comprises a first preset value and a second preset value, the first preset value indicates that the first word and the second word are successfully matched, and the second preset value indicates that the first word and the second word are not successfully matched; If an abnormal first word exists in the first word segmentation result, determining that the hanging building information is abnormal POI data, wherein the abnormal first word is a first word with a binarization matching result being a second preset value. According to another aspect of the present disclosure, there is provided a detection apparatus for abnormal POI data, including: The acquisition module is used for acquiring reference building information corresponding to the POI in the database and acquiring hanging building information corresponding to the POI in the map aiming at the POI to be detected in the map; The word segmentation module is used for respectively segmenting the reference building information and the hanging building information to obtain a first word segmentation result and a second word segmentation result, wherein the first word segmentation result is the word segmentation result of one of the reference building information and the hanging building information; The matching module is used for matching each first word in the first word segmentation result with each second word in the second word segmentation result to obtain a binarization matching result of each first word, wherein the binarization matching result comprises a first preset value and a second preset value, the first preset value indicates that the first word and the second word are successfully matched, and the second preset value indicates that the first word and the second word are not successfully matched; The detection module is used for determining that the hanging building information is abnormal POI data if the abnormal first words exist in the first word segmentation result, wherein the abnormal first words are first words with binarization matching results being second preset values. According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform any one of the methods of detecting abnormal POI data described above. According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the method of detecting abnormal POI data described in any one of the above. According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of detecting abnormal POI data as described in any of the above. It should be understood that the description in this section is not intended to identify key or critical features o