Search

CN-121979858-A - Field mapping method of heterogeneous system

CN121979858ACN 121979858 ACN121979858 ACN 121979858ACN-121979858-A

Abstract

The application relates to the field of data migration, in particular to a field mapping method of a heterogeneous system. In the embodiment of the application, the feature vector of the fields of the new and old systems is determined through the feature extraction model, so that the mapping relation between the fields of the new and old systems is intelligently constructed, and the data migration flow is greatly simplified. The method has the advantages that the complicated table structure of the old system is not required to be familiar to the migration responsible person of the new system, the possible difference among different systems is skillfully shielded, the whole migration process is efficient, the accuracy is effectively ensured, the dependence of the traditional migration mode on the structure of the old system is eliminated, the migration risk caused by the cognitive difference of the person is reduced, and solid support is provided for the stable transition of data among different systems.

Inventors

  • SUN JIANXIN
  • ZHANG LIANG

Assignees

  • 深圳前海微众银行股份有限公司

Dates

Publication Date
20260505
Application Date
20260104

Claims (11)

  1. 1. A method for field mapping in a heterogeneous system, the method comprising: Extracting feature vectors from each field in a system according to basic information of each field in the system, field association relation corresponding to each service and sub-data information corresponding to each field in a set time length by adopting a feature extraction model, so as to obtain feature vectors corresponding to each field, wherein the feature vectors at least comprise semantic vectors, statistical vectors and structural vectors; Determining a target first field matched with each second field corresponding to each second field according to the second feature vector of each second field and the first feature vector of each first field; and mapping the first data information of the target first field in the old system to the second data information of the second field corresponding to the target first field in the new system.
  2. 2. The method of claim 1, wherein the feature extraction model comprises a semantic feature extraction sub-model, a structural feature extraction sub-model, a statistical feature extraction sub-model, and a three-flow graph neural network; The method comprises the steps of adopting a feature extraction model, extracting feature vectors of each field in a system according to basic information of each field in the system, field association relation corresponding to each service and sub-data information corresponding to each field in a set duration, and before obtaining the feature vectors corresponding to each field, the method further comprises the steps of: determining the shortest path corresponding to each field and the set field according to the field association relation corresponding to each service; the feature extraction model is adopted, feature vector extraction is carried out on each field in the system according to basic information of each field in the system, field association relation corresponding to each service and sub-data information corresponding to each field in set duration, and the feature vector corresponding to each field is obtained, and the steps include: Determining a semantic vector corresponding to each field in the system according to the basic information of each field by adopting the semantic extraction sub-model; Adopting the structural feature extraction submodel, and determining a structural vector corresponding to each field according to the shortest path corresponding to each field and a set field; adopting the statistical feature extraction submodel, and determining a statistical vector corresponding to each field according to the submodel information corresponding to each field in a set time length; updating the semantic vector, the structural vector and the statistical vector corresponding to each field by adopting the three-flow diagram neural network; And carrying out weighted fusion on the semantic vector, the statistical vector and the structural vector updated by each field according to the weight corresponding to each field to obtain the feature vector corresponding to each field.
  3. 3. The method of claim 2, wherein determining the semantic vector corresponding to each field in the system according to the basic information of each field using the semantic extraction sub-model comprises: And aiming at each field, the semantic extraction sub-model splices basic information corresponding to the field into a text sequence according to a preset sequence, and cuts the entity in the text sequence by adopting a pre-configured business field vocabulary to obtain a cut intermediate sequence, and the semantic extraction sub-model performs semantic recognition on the intermediate sequence to obtain a semantic vector corresponding to the field.
  4. 4. The method of claim 2, wherein the determining the structure vector corresponding to each field according to the field association relationship corresponding to each service using the structural feature extraction sub-model comprises: For each field, the structural feature extraction sub-model constructs a shortest path vector according to a shortest path corresponding to the field, each dimension of the shortest path vector corresponds to one node in the shortest path, the structural feature extraction sub-model determines a position vector according to the position of each node in the shortest path, and the structural feature extraction sub-model splices the position vector and the shortest path vector to obtain a spliced vector for structural feature extraction to obtain the structural vector corresponding to the field.
  5. 5. The method of claim 2, wherein the employing the statistical feature extraction sub-model to determine the statistical vector corresponding to each field according to the sub-data information corresponding to each field in the set duration comprises: for each field, determining the quantity of sub-data information corresponding to each field in each preset interval in the set duration of an input layer of the statistical feature extraction sub-model, determining an initial statistical vector corresponding to the field, respectively convolving the initial statistical vector by a plurality of convolution layers of the statistical feature extraction sub-model, performing global maximum pooling operation on the output of each convolution layer by a pooling layer of the statistical feature extraction sub-model to obtain each intermediate statistical vector, splicing each intermediate statistical vector into a joint vector by an output layer of the statistical feature extraction sub-model, and determining the joint vector as the statistical vector corresponding to the field.
  6. 6. The method according to any one of claims 2-5, wherein the basic information of each field, the field association relation corresponding to each service, and the sub-data information corresponding to each field within a set duration are stored in a heterogram; the updating of the semantic vector, the structural vector and the statistical vector corresponding to each field by adopting the three-flow graph neural network comprises the following steps: adding the semantic vector, the structural vector and the statistical vector corresponding to each field into the node corresponding to each field in the heterogram; The method comprises the steps of inputting an updated heterogeneous graph into the three-flow graph neural network, updating semantic vectors of each field in the heterogeneous graph according to whether two fields corresponding to each side belong to the same service entity or not, updating the statistical vectors corresponding to each field in the heterogeneous graph according to difference vectors of the statistical vectors of each field and preset global distribution primitive vectors, and updating the structural vectors of each field according to hop numbers corresponding to the structural vectors of each field.
  7. 7. The method according to claim 2, wherein the method further comprises: acquiring a preset constraint vector corresponding to each field, wherein the preset constraint vector is determined based on an abnormal alarm tag and/or a business tag of the field; and splicing the preset constraint vector corresponding to each field with the corresponding feature vector, and updating the feature vector by adopting the spliced vector.
  8. 8. The method of claim 1, wherein determining the target first field corresponding to each second field that matches the second field based on the second feature vector of the each second field and the first feature vector of the each first field comprises: and determining the first field with the corresponding similarity exceeding a similarity threshold value as a target first field matched with the second field according to the second feature vector of the second field and the first feature vector of each first field.
  9. 9. The method of claim 8, wherein if there is no first field for which the corresponding similarity exceeds a similarity threshold, the method further comprises: Determining a difference between the highest similarity of the second field and the similarity of each first field and the similarity threshold; And if the difference value is in the preset interval, generating a work order to be audited, and triggering a manual audit notification.
  10. 10. The method according to claim 9, wherein the method further comprises: Determining the number of currently generated work orders to be checked; and if the number reaches a preset number threshold, starting a fine tuning task of the feature extraction model according to a second field corresponding to the work order to be audited and a first field with highest similarity with the second field.
  11. 11. The method of claim 1, wherein the training process of the three-flow graph neural network comprises: Obtaining a sample pair, wherein the sample pair comprises an initial sample semantic vector, an initial sample structure vector and an initial sample statistical vector which respectively correspond to two sample fields, whether the two sample fields are labels of the same node or not and a boundary value of the sample pair, and the boundary value is determined based on the number of sample fields which are corresponding to any one sample field in the sample pair and have a direct association relation with the sample field; respectively updating initial sample semantic vectors, initial sample structure vectors and initial sample statistical vectors corresponding to the two sample fields in the sample pair by adopting a three-flow-chart neural network to be trained to obtain sample semantic vectors, sample structure vectors and sample statistical vectors corresponding to the two fields respectively; Determining a semantic distance according to sample semantic vectors respectively corresponding to the two fields, and determining a semantic loss value according to the relation between the semantic distance and the boundary value and whether the two sample fields are labels of the same node; Determining a statistical loss value according to the label, the sample statistical vector corresponding to the two fields and the boundary value; Determining the structure loss value according to the sample structure vector and the boundary value which are respectively corresponding to the two sample fields included in the sample pair; and determining a total loss value according to the semantic loss value, the statistical loss value and the structural loss value, and adjusting parameters of the three-flow graph neural network according to the total loss value.

Description

Field mapping method of heterogeneous system Technical Field The application relates to the field of data migration, in particular to a field mapping method of a heterogeneous system. Background In the complex engineering of switching between new and old core systems, data migration is taken as a core link, and the importance of the data migration is self-evident. Data migration is not a simple data handling but rather requires an all-round consideration. Wherein, the service field mapping of the new and old systems is a very critical part. However, because the new and old systems have significant differences in service models, database designs and interface specifications, field semantics and structures are often inconsistent, and thus the problem that the field mapping relationship is difficult to automatically identify is caused by data migration. The prior art mainly relies on manual combing or a matching method based on rule and simple semantic similarity to generate candidate mapping by comparing field names, notes and data types. However, in the prior art, complex scenes such as field naming drift, measurement unit change, call link extension caused by service splitting and the like are difficult to deal with, particularly, under the conditions of continuous iteration of a system and dynamic change of metadata, the rule base has high maintenance cost and weak generalization capability, mismatching or mismatching is easy to generate, the mapping precision and automation level are limited, and the data migration requirements of high reliability and high efficiency are difficult to meet. Disclosure of Invention The application provides a field mapping method of a heterogeneous system, which is used for adapting to various scenes and improving the accuracy of field mapping. The embodiment of the application provides a field mapping method of a heterogeneous system, which comprises the following steps: Extracting feature vectors from each field in a system according to basic information of each field in the system, field association relation corresponding to each service and sub-data information corresponding to each field in a set time length by adopting a feature extraction model, so as to obtain feature vectors corresponding to each field, wherein the feature vectors at least comprise semantic vectors, statistical vectors and structural vectors; Determining a target first field matched with each second field corresponding to each second field according to the second feature vector of each second field and the first feature vector of each first field; and mapping the first data information of the target first field in the old system to the second data information of the second field corresponding to the target first field in the new system. Further, the feature extraction model comprises a semantic feature extraction sub-model, a structural feature extraction sub-model, a statistical feature extraction sub-model and a three-flow diagram neural network; The method comprises the steps of adopting a feature extraction model, extracting feature vectors of each field in a system according to basic information of each field in the system, field association relation corresponding to each service and sub-data information corresponding to each field in a set duration, and before obtaining the feature vectors corresponding to each field, the method further comprises the steps of: determining the shortest path corresponding to each field and the set field according to the field association relation corresponding to each service; the feature extraction model is adopted, feature vector extraction is carried out on each field in the system according to basic information of each field in the system, field association relation corresponding to each service and sub-data information corresponding to each field in set duration, and the feature vector corresponding to each field is obtained, and the steps include: Determining a semantic vector corresponding to each field in the system according to the basic information of each field by adopting the semantic extraction sub-model; Adopting the structural feature extraction submodel, and determining a structural vector corresponding to each field according to the shortest path corresponding to each field and a set field; adopting the statistical feature extraction submodel, and determining a statistical vector corresponding to each field according to the submodel information corresponding to each field in a set time length; updating the semantic vector, the structural vector and the statistical vector corresponding to each field by adopting the three-flow diagram neural network; And carrying out weighted fusion on the semantic vector, the statistical vector and the structural vector updated by each field according to the weight corresponding to each field to obtain the feature vector corresponding to each field. Further, the determining, by using the semantic extraction submo