Search

CN-114218205-B - Conflict data positioning method, device, equipment and readable storage medium

CN114218205BCN 114218205 BCN114218205 BCN 114218205BCN-114218205-B

Abstract

The application discloses a conflict data positioning method, a device, equipment and a readable storage medium, wherein the method comprises the steps of determining conflict classes of a plurality of levels in advance according to the data of each historical user in a plurality of historical users on each platform; the method comprises the steps of determining conflict classes to which current data belong on each platform according to the current data of a current user on each platform and the centers of the conflict classes of each level, comparing the level of the conflict classes to which the current data belong on each platform with a preset level, and determining the current data in the platform to which the level of the conflict classes belongs is higher than the preset level as conflict data. According to the technical scheme disclosed by the application, the conflict classes of a plurality of levels are determined based on the data of the historical users, the conflict class to which the data of the current user on each platform belongs is determined, and the conflict data of the current user is determined according to the conflict class level, so that the automatic positioning of the conflict data of the current user in multiple platforms is realized, and the efficiency and accuracy of the positioning of the conflict data are improved.

Inventors

  • CHEN JIAYI
  • HU ZHIGUANG
  • BAI MEI
  • XUE CHUNSHENG

Assignees

  • 国网征信有限公司

Dates

Publication Date
20260505
Application Date
20211217

Claims (8)

  1. 1. A method for locating conflicting data, comprising: Determining conflict classes of multiple levels in advance according to data of each historical user in multiple historical users on each platform; According to the current data of the current user on each platform and the centers of the conflict classes of each level, determining the conflict class to which the current data on each platform belongs; Comparing the level of the conflict class to which the current data on each platform belongs with a preset level, and determining the current data in the platforms with the level of the conflict class higher than the preset level as conflict data; The data are cross-platform user data, and the higher the level of the conflict class is, the more obvious the conflict between the data on the corresponding platform and the data on other platforms is; wherein, the conflict class of a plurality of levels is determined in advance according to the data of each historical user in a plurality of historical users on each platform, and the method comprises the following steps: the distance between the data of each historical user on the jth platform and the data on the rest platforms is calculated in advance to respectively obtain a distance group corresponding to the data of each historical user on the jth platform, wherein j=1, 2, m and m are the number of the platforms; obtaining conflict classes of multiple levels by using a clustering algorithm according to the distance group corresponding to the data of each historical user on each platform; According to the current data of the current user on each platform and the centers of the conflict classes of each level, determining the conflict class to which the current data on each platform belongs, including: Calculating the distance between the current data of the current user on the jth platform and the current data on the rest platforms to obtain a distance group corresponding to the current data of the current user on the jth platform; Calculating the distance between a distance group corresponding to the current data of the current user on the jth platform and the center of the conflict class of each level, and determining the conflict class corresponding to the minimum distance as the conflict class to which the current data of the current user on the jth platform belongs; the distance between the data of each historical user on the jth platform and the data on the rest platforms is Euclidean distance; According to the distance group corresponding to the data of each historical user on each platform, a clustering algorithm is utilized to obtain a plurality of levels of conflict classes, and the method comprises the following steps: And obtaining conflict classes of multiple levels and centers of the conflict classes of each level by adopting an unsupervised clustering method according to the distance group corresponding to the data of each historical user on each platform.
  2. 2. The method for locating conflicting data according to claim 1, further comprising, after determining current data in a platform to which a conflicting class belongs having a level higher than a preset level as conflicting data: extracting a first data attribute table from each conflict data of the current user respectively, and extracting a second data attribute table from any non-conflict data of the current user; Comparing the attribute fields in each first data attribute table with the corresponding attribute fields in the second data attribute table; if the attribute fields in the first data attribute table are inconsistent with the corresponding attribute fields in the second data attribute table, determining the first data attribute table inconsistent with the corresponding attribute fields in the second data attribute table as a data attribute table with attribute conflict; the conflicting data generation time is determined from a data attribute table in which attribute conflicts exist.
  3. 3. The conflicting data location method of claim 2, further comprising: If the attribute fields in the first data attribute table are consistent with the corresponding attribute fields in the second data attribute table, calculating the similarity between the attribute fields in the first data attribute table and the corresponding attribute fields in the second data attribute table; Judging whether the similarity is larger than a threshold value, if not, determining that attribute field conflict exists between the first data attribute table and the second data attribute table, and determining the first data attribute table with the attribute field conflict exists between the second data attribute table as the data attribute table with the attribute conflict; the conflict generation time is determined from a data attribute table in which attribute conflicts exist.
  4. 4. A method of locating conflicting data according to claim 3, wherein calculating the similarity between an attribute field in the first data attribute table and a corresponding attribute field in the second data attribute table comprises: and calculating the similarity between the attribute value of the attribute field in the first data attribute table and the attribute value of the corresponding attribute field in the second data attribute table.
  5. 5. A method of locating conflicting data according to claim 3, wherein calculating the similarity between an attribute field in the first data attribute table and a corresponding attribute field in the second data attribute table comprises: And calculating the semantic similarity of the character types between the attribute fields in the first data attribute table and the corresponding attribute fields in the second data attribute table according to the sequence of the attribute values corresponding to the attribute fields.
  6. 6. A conflicting data location device comprising: The first determining module is used for determining conflict classes of a plurality of levels in advance according to the data of each historical user in a plurality of historical users on each platform; The second determining module is used for determining the conflict class to which the current data on each platform belongs according to the current data of the current user on each platform and the centers of the conflict classes of each level; The comparison module is used for comparing the level of the conflict class to which the current data on each platform belongs with a preset level and determining the current data in the platform with the level of the conflict class higher than the preset level as the conflict data; The data are cross-platform user data, and the higher the level of the conflict class is, the more obvious the conflict between the data on the corresponding platform and the data on other platforms is; Wherein the first determining module includes: The first calculation unit is used for calculating the distance between the data of each historical user on the jth platform and the data on the other platforms in advance so as to respectively obtain a distance group corresponding to the data of each historical user on the jth platform, wherein j=1, 2, m, m is the number of the platforms; the obtaining unit is used for obtaining conflict classes of multiple levels by using a clustering algorithm according to the distance groups corresponding to the data of each historical user on each platform; the second determining module includes: The second calculation unit is used for calculating the distance between the current data of the current user on the jth platform and the current data on the other platforms so as to obtain a distance group corresponding to the current data of the current user on the jth platform; a third calculation unit, configured to calculate a distance between a distance group corresponding to current data of the current user on a jth platform and a center of the conflict class of each level, and determine a conflict class corresponding to a minimum distance as a conflict class to which the current data of the current user on the jth platform belongs; the distance between the data of each historical user on the jth platform and the data on the rest platforms is Euclidean distance; According to the distance group corresponding to the data of each historical user on each platform, a clustering algorithm is utilized to obtain a plurality of levels of conflict classes, and the method comprises the following steps: And obtaining conflict classes of multiple levels and centers of the conflict classes of each level by adopting an unsupervised clustering method according to the distance group corresponding to the data of each historical user on each platform.
  7. 7. A conflicting data location device comprising: A memory for storing a computer program; processor for implementing the steps of the conflicting data location method as claimed in any one of claims 1 to 5 when executing said computer program.
  8. 8. A readable storage medium, characterized in that the readable storage medium has stored therein a computer program which, when executed by a processor, implements the steps of the conflicting data location method of any of claims 1 to 5.

Description

Conflict data positioning method, device, equipment and readable storage medium Technical Field The present application relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a readable storage medium for locating conflicting data. Background User data from cross departments generally has characteristics of multiple types, uncertainty and the like, and the obtained user data have inconsistent phenomenon based on different platform specific recording modes. The digital information conflict is an inevitable product in a cross-department service large environment, and is a huge obstacle that digital information data tends to be unified and standardized. At present, for cross-platform user data, conflict data is generally searched by a manual mode through comparison and the like so as to process the conflict data, so that inconsistency among the conflict data is reduced as much as possible, and further the fusion quality of the cross-platform data is improved. However, manually searching for conflicting data may result in relatively low efficiency and accuracy. In summary, how to improve accuracy and efficiency of positioning conflict data in a cross-platform is a technical problem to be solved by those skilled in the art. Disclosure of Invention Accordingly, the present application is directed to a method, apparatus, device and readable storage medium for locating conflicting data, which are used for improving accuracy and efficiency of locating conflicting data across platforms. In order to achieve the above object, the present application provides the following technical solutions: a method of conflicting data location comprising: Determining conflict classes of multiple levels in advance according to data of each historical user in multiple historical users on each platform; According to the current data of the current user on each platform and the centers of the conflict classes of each level, determining the conflict class to which the current data on each platform belongs; Comparing the level of the conflict class to which the current data on each platform belongs with a preset level, and determining the current data in the platform with the level of the conflict class higher than the preset level as the conflict data. Preferably, determining the conflict class of multiple levels in advance according to the data of each historical user in the multiple historical users on each platform includes: The distance between the data of each historical user on the jth platform and the data on the rest platforms is calculated in advance to respectively obtain a distance group corresponding to the data of each historical user on the jth platform, wherein j=1, 2, m and m are the number of the platforms; obtaining conflict classes of multiple levels by using a clustering algorithm according to the distance group corresponding to the data of each historical user on each platform; According to the current data of the current user on each platform and the centers of the conflict classes of each level, determining the conflict class to which the current data on each platform belongs, including: Calculating the distance between the current data of the current user on the jth platform and the current data on the rest platforms to obtain a distance group corresponding to the current data of the current user on the jth platform; And calculating the distance between the distance group corresponding to the current data of the current user on the jth platform and the center of the conflict class of each level, and determining the conflict class corresponding to the minimum distance as the conflict class to which the current data of the current user on the jth platform belongs. Preferably, after determining the current data in the platform with the level of the belonging conflict class higher than the preset level as the conflict data, the method further includes: extracting a first data attribute table from each conflict data of the current user respectively, and extracting a second data attribute table from any non-conflict data of the current user; Comparing the attribute fields in each first data attribute table with the corresponding attribute fields in the second data attribute table; if the attribute fields in the first data attribute table are inconsistent with the corresponding attribute fields in the second data attribute table, determining the first data attribute table inconsistent with the corresponding attribute fields in the second data attribute table as a data attribute table with attribute conflict; the conflicting data generation time is determined from a data attribute table in which attribute conflicts exist. Preferably, the method further comprises: If the attribute fields in the first data attribute table are consistent with the corresponding attribute fields in the second data attribute table, calculating the similarity between the attribute fields