Search

CN-121979936-A - Multi-source heterogeneous data standardized processing method and terminal based on meta-model self-adaptive matching

CN121979936ACN 121979936 ACN121979936 ACN 121979936ACN-121979936-A

Abstract

The invention discloses a multi-source heterogeneous data standardized processing method and a terminal based on meta-model self-adaption matching, which relate to the technical field of data processing, and are used for acquiring multi-source heterogeneous data sets containing original data units with different sources and structures, recording data sources and acquisition scene attributes of the multi-source heterogeneous data sets, carrying out pre-adaption adjustment on a preset meta-model library based on the attributes to generate a pre-adaption meta-model library, extracting multi-dimensional structural features of the original data units, carrying out multi-round self-adaption matching on the multi-dimensional structural features and templates in the pre-adaption meta-model library to generate matching results and optimization suggestions, calling data conversion rules according to the matching results and carrying out dynamic standardized conversion on the original data units by combining the optimization suggestions to generate standardized data units and conversion process records, collecting data set-up standardized data sets, and carrying out association analysis to generate a processing result report and a meta-model library optimization scheme. The invention improves the accuracy and efficiency of the standardized processing of the multi-source heterogeneous data.

Inventors

  • MAO FENG
  • CHEN FEI
  • Shan Junquan
  • LU KAI

Assignees

  • 南京莱斯信息技术股份有限公司

Dates

Publication Date
20260505
Application Date
20251230

Claims (10)

  1. 1. A multi-source heterogeneous data standardization processing method based on meta-model self-adaption matching is characterized by comprising the following steps: acquiring a multi-source heterogeneous data set, wherein the multi-source heterogeneous data set comprises original data units which come from different data sources and have different data structures, and recording the data source attribute and the acquisition scene attribute of each original data unit; Based on the data source attribute and the acquisition scene attribute of the multi-source heterogeneous data set, performing meta-model pre-adaptation adjustment on a preset meta-model library to generate a pre-adaptation meta-model library, wherein the preset meta-model library comprises initial meta-model templates designed for different data structure types, and each initial meta-model template comprises data structure description information, data type constraint information and data conversion rule information; Extracting data structure characteristics of each original data unit in the multi-source heterogeneous data set, generating multi-dimensional structure characteristics of each original data unit, performing multi-round self-adaptive matching based on the multi-dimensional structure characteristics and meta-model templates in the pre-adaptive meta-model library, and generating meta-model matching results and matching optimization suggestions corresponding to each original data unit; calling data conversion rule information of a corresponding meta-model template according to the meta-model matching result, and carrying out dynamic standardized conversion on each original data unit by combining the matching optimization suggestion to generate standardized data units and conversion process records corresponding to each original data unit; And collecting standardized data units and conversion process records corresponding to all original data units, constructing a standardized data set, performing association analysis on the standardized data set and the conversion process records, and generating a multi-source heterogeneous data standardized processing result report and a meta-model library optimization scheme.
  2. 2. The method for standardized processing of multi-source heterogeneous data based on meta-model adaptive matching according to claim 1, wherein the pre-adapting the meta-model library to generate the pre-adapting meta-model library based on the data source attribute and the acquisition scene attribute of the multi-source heterogeneous data set comprises: Extracting data source attributes of all original data units in the multi-source heterogeneous data set, and classifying and counting the number of the original data units and the data structure difference types corresponding to different data sources to generate a data source structure distribution table; Extracting the acquisition scene attribute of all the original data units in the multi-source heterogeneous data set, analyzing the field quantity change rule and field association mode characteristics of the original data units under different acquisition scenes, and generating a scene field characteristic table; extracting the field quantity range and the field association mode type in the data structure description information of each initial meta-model template; Comparing the data structure difference type in the data source structure distribution table with the field association mode type of each initial meta-model template, and marking the initial meta-model templates with unmatched field association modes; Comparing the field quantity change rule in the scene field feature table with the field quantity range of each initial meta-model template, and marking the initial meta-model templates with unmatched field quantity ranges; For the initial meta model template with unmatched field association modes, adjusting the field association mode type in the data structure description information of the initial meta model template to enable the adjusted field association mode type to be adaptive to the structure difference type of the corresponding data source; The method comprises the steps of adjusting the field quantity range in the description information of a data structure of an initial meta-model template with unmatched field quantity ranges, so that the adjusted field quantity range is adapted to the field quantity change rule of a corresponding acquisition scene; after finishing adjustment, carrying out suitability verification on each adjusted meta-model template, selecting a corresponding data source and an original data unit under an acquisition scene, and testing the coverage degree of the template on the data structure; According to the suitability verification result, the field constraint condition and the conversion rule priority of the meta-model template are finely adjusted until the coverage degree of the data structure under the corresponding data source and acquisition scene of the template reaches a preset deviation range; integrating all the meta-model templates which are adjusted and pass through suitability verification, constructing a pre-adaptive meta-model library, recording the adjustment content of each meta-model template, the adaptive data source and the acquisition scene information.
  3. 3. The method for standardized processing of multi-source heterogeneous data based on meta-model self-adaptive matching according to claim 2, wherein the step of finely adjusting field constraint conditions and conversion rule priorities of the meta-model template according to the suitability verification result until the coverage degree of the data structure in the corresponding data source and acquisition scene of the template reaches a preset deviation range comprises the following steps: Extracting coverage conditions of each meta-model template on corresponding data sources and original data units in an acquisition scene in the suitability verification process, and counting the number of the original data units which are not covered by the templates and the uncovered reasons; If the uncovered reason is that the field constraint condition of the template is too strict, analyzing the field attribute of the uncovered original data unit, adjusting the data type allowable range or format requirement in the template field constraint condition, and expanding the adaptation range of the constraint condition; if the uncovered reason is that the priority of the conversion rule of the template is set improperly, so that part of original data units cannot be converted according to the rule, the applicable scene of the conversion rule is reevaluated, the priority sequence of different conversion rules is adjusted, and the rule with wide applicable scene is carried out preferentially; after finishing adjustment, selecting an uncovered original data unit, performing suitability test again, counting the coverage quantity of the template on the uncovered original data unit after adjustment, and calculating the coverage degree; if the coverage degree does not reach the preset deviation range, repeating the field constraint condition adjustment or conversion rule priority adjustment step until the coverage degree of the original data unit under the corresponding data source and acquisition scene of the template reaches the preset deviation range; recording the specific content of each adjustment, the coverage degree change after adjustment and the original data unit information used for testing, and forming a template fine adjustment record.
  4. 4. The method for processing multi-source heterogeneous data standardization based on meta-model adaptive matching according to claim 1, wherein the extracting the data structure feature of each original data unit in the multi-source heterogeneous data set to generate a multi-dimensional structure feature of each original data unit, performing multi-round adaptive matching based on the multi-dimensional structure feature and a meta-model template in the pre-adaptive meta-model library to generate a meta-model matching result and a matching optimization suggestion corresponding to each original data unit, and the method comprises: Selecting a single original data unit in the multi-source heterogeneous data set, carrying out field hierarchical analysis on the original data unit, determining the hierarchical attribution and the hierarchical depth of each field, and generating a field hierarchical structure table; Analyzing the reference relation and the dependency relation among different fields in the original data unit, counting the number of field association and the association path length, and generating a field association relation table; extracting the data type and the data format of each field in the original data unit, counting the duty ratio of different data types in all fields, and generating a data type distribution table; Constructing a multi-dimensional structure feature of the original data unit according to the field hierarchical structure table, the field association relation table and the data type distribution table, wherein the multi-dimensional structure feature comprises hierarchical features, association features and type features; Extracting all meta-model templates from the pre-adaptive meta-model library, and extracting the hierarchical standard, the association standard and the type standard corresponding to the data structure description information of each meta-model template; In the first round of matching, comparing the level features of the multi-dimensional structural features with the level standard of each meta-model template, generating level adaptation degree through comparing the corresponding number of field levels with the level depth fitness, and screening a candidate template set with the level adaptation degree meeting a first preset threshold; In the second round of matching, the association features of the multi-dimensional structural features are compared with the association standard of each template in the candidate template set, the association adaptation degree is generated through the comparison of the field association quantity matching degree and the association path length fitting degree, and the secondary candidate template set with the association adaptation degree meeting a second preset threshold is screened; In the third round of matching, comparing the type characteristics of the multi-dimensional structural characteristics with the type standard of each template in the secondary candidate template set, generating type adaptation degree through the comparison of the data type proportion matching degree and the data format matching degree, and selecting the template with the highest type adaptation degree as a primary matching template; Analyzing the adaptation deviation of the preliminary matching template and the multi-dimensional structural features, if the deviation is within a preset deviation range, determining the preliminary matching template as a final matching template, and generating a meta-model matching result comprising a final matching template identifier and each round of adaptation degree; if the deviation exceeds the preset deviation range, generating a matching optimization suggestion according to the deviation content, wherein the matching optimization suggestion comprises a template standard item to be adjusted and an adjustment direction, and taking the preliminary matching template and the matching optimization suggestion together as a meta-model matching result.
  5. 5. The method for standardization processing of multi-source heterogeneous data based on meta-model adaptive matching of claim 4, wherein the constructing the multi-dimensional structure feature of the original data unit according to the field hierarchy table, the field association relationship table and the data type distribution table includes: Extracting the hierarchical depth of each field from the field hierarchical structure table, counting the average value and the maximum value of the hierarchical depths of all the fields, and taking the hierarchical depth average value, the hierarchical depth maximum value and the field quantity distribution of different hierarchical depths as the constituent elements of the hierarchical features; Extracting field association quantity and association path length from the field association relation table, counting the average association quantity of each field and the median of all association path lengths, and taking the field average association quantity, the median of the association path length and the field proportion of the association quantity exceeding the preset quantity as the constituent elements of the association characteristics; Extracting field duty ratios of different data types from the data type distribution table, screening data types with field duty ratios exceeding a preset duty ratio threshold to form a main data type list, and taking the main data type list, the duty ratio value of each data type in the main data type list and the number of data type categories as the constituent elements of the type characteristics; Carrying out unified format processing on each component of the hierarchical features, the associated features and the type features, and describing each component in the same data representation form; And integrating the processed hierarchical features, the processed associated features and the processed type features to form multi-dimensional structural features of the original data unit, and distributing unique identifiers for each dimensional feature.
  6. 6. The method for processing the multi-source heterogeneous data standardization based on the meta-model adaptive matching according to claim 1, wherein the invoking the data conversion rule information of the corresponding meta-model template according to the meta-model matching result, the dynamic standardized conversion is performed on each original data unit in combination with the matching optimization suggestion, and the standardized data unit and the conversion process record corresponding to each original data unit are generated, and the method comprises the following steps: Extracting a final matching template from the meta-model matching result, and calling data conversion rule information contained in the final matching template, wherein the data conversion rule information contains a conversion operation sequence, a field mapping relation and a content adjustment mode; If the meta model matching result contains a matching optimization suggestion, adjusting a conversion operation sequence order or a field mapping relation priority in the data conversion rule information according to the matching optimization suggestion, and generating optimized data conversion rule information; According to the optimized conversion operation sequence in the data conversion rule information, mapping the fields of the original data units, replacing the original field names with standard field names according to the field mapping relation, and recording the field mapping corresponding relation; after field mapping processing is completed, converting the data type of each field, converting the original data type into a standard data type according to the type conversion requirement in the data conversion rule information, correcting the data format according to a content adjustment mode if a data format mismatch phenomenon occurs in the conversion process, and recording data type conversion and format correction details; Performing field association relation adjustment on the original data unit after the data type conversion is completed, adjusting the reference relation and the dependency relation among fields according to the association relation standard in the data conversion rule information, enabling the field association to accord with the association relation standard, and recording an association relation adjustment step; After all conversion operations are completed, reorganizing all the processed fields according to the data structure description information of the final matching template to form standardized data units, and recording field constitution and data content of the standardized data units; Integrating the field mapping corresponding relation, the data type conversion and format correction details, the association relation adjustment step and the standardized data unit constitution information to generate the conversion process record of the original data unit.
  7. 7. The method for normalized processing of multi-source heterogeneous data based on meta-model adaptive matching according to claim 6, wherein if the meta-model matching result includes a matching optimization suggestion, adjusting a conversion operation sequence order or a field mapping relation priority in data conversion rule information according to the matching optimization suggestion, generating optimized data conversion rule information includes: Analyzing the matching optimization suggestion, determining a template standard item to be adjusted in the matching optimization suggestion, and judging whether a conversion rule component corresponding to the template standard item is a conversion operation sequence or a field mapping relation; If the conversion operation sequence is required to be adjusted, analyzing an adjustment direction in the matching optimization suggestion, if the adjustment direction is used for improving the execution priority of the target class conversion operation, moving the target class conversion operation forward to the position in front of the conversion operation sequence; If the field mapping relation priority is required to be adjusted, analyzing an adjustment basis in the matching optimization suggestion, if the field mapping relation is required to be adjusted, analyzing the adjustment basis in the matching optimization suggestion, if the adjustment basis is the field semantic association degree, calling a preset semantic association degree rule base to score the fields, and descending the field mapping relation according to the scoring result to determine the execution sequence; After finishing the adjustment, simulating the optimized conversion operation sequence and field mapping relation, selecting partial fields of the original data unit for trial conversion, and observing whether new adaptation deviation occurs in the trial conversion process; If the new adaptation deviation appears, the conversion operation sequence or the field mapping relation is finely adjusted according to the deviation condition, and the trial conversion is performed again until no new adaptation deviation appears; recording the data conversion rule information difference before and after adjustment, the deviation condition in the trial conversion process and the final optimization result, and forming a conversion rule optimization record which is used as a component part of the conversion process record.
  8. 8. The method for normalized processing of multi-source heterogeneous data based on meta-model adaptive matching according to claim 1, wherein the steps of collecting normalized data units and conversion process records corresponding to all original data units, constructing a normalized data set, performing correlation analysis on the normalized data set and the conversion process records, and generating a multi-source heterogeneous data normalized processing result report and a meta-model library optimization scheme include: Traversing standardized data units and conversion process records corresponding to all original data units, collecting the complete data content and the corresponding conversion process records of each standardized data unit, and establishing a one-to-one correspondence between the standardized data units and the conversion process records; Grouping all the standardized data units according to the data source attributes of the corresponding original data units to form standardized data groups divided according to the data sources, and counting the number of the standardized data units corresponding to each data source and the conversion completion rate; carrying out field consistency analysis on the standardized data units in each standardized data group, comparing the data content formats of the same standard fields of different standardized data units under the same data source, and counting the field occupation ratio with uniform formats; Analyzing all conversion process records, extracting records with field mapping abnormality, data type conversion failure and association relation adjustment difficulty which appear in the conversion process, and counting the occurrence times of various abnormalities and corresponding data source attributes and meta-model template types; Generating a multi-source heterogeneous data standardized processing result report according to the counted standardized data unit number, the conversion completion rate, the format unified field ratio and various abnormal occurrence conditions, wherein the processing result report comprises standardized processing results of various data sources, main abnormal types in the conversion process and standardized data quality assessment; locating a meta model template problem which causes abnormality according to an abnormal record and a standardized data quality evaluation result in a conversion process, and generating a template conversion rule optimization suggestion if the conversion rule of the abnormality from the template is imperfect; integrating the template conversion rule optimization suggestion and the template structure standard adjustment suggestion to form a meta model library optimization scheme, wherein the meta model library optimization scheme comprises meta model template identification to be optimized, specific optimization content and optimization implementation steps.
  9. 9. The method for normalized processing of multi-source heterogeneous data based on meta-model adaptive matching according to claim 8, wherein locating the meta-model template problem causing abnormality according to the abnormal record and normalized data quality evaluation result in the conversion process, generating template conversion rule optimization suggestion if the conversion rule of the abnormality from the template is imperfect, generating template structure standard adjustment suggestion if the structure standard of the abnormality from the template is not matched, comprises: sorting abnormal records in the conversion process, dividing the abnormal records into three types of abnormal states including field mapping abnormal states, data type conversion abnormal states and association relation adjustment abnormal states, and counting meta-model template identifiers and corresponding abnormal times related to each type of abnormal states; The method comprises the steps of (1) for meta-model templates in each type of abnormality, retrieving data conversion rule information and data structure description information of the meta-model templates, and comparing abnormal phenomena in an abnormality record with rules and standards of the meta-model templates; If the field mapping exception is represented by that the original field cannot find the corresponding standard field and the mapping relation of the template field does not contain the mapping item corresponding to the original field, the exception is judged to be from the imperfect conversion rule of the template, and a conversion rule optimization suggestion for supplementing the mapping item of the original field is generated; if the data type conversion exception is represented by that the original data type exceeds the template type constraint range, and the comparison finds that the template data type constraint range does not cover the original data type, the structure standard of the exception from the template is judged to be unmatched, and a structure standard adjustment suggestion for expanding the template data type constraint range is generated; If the association relation adjustment abnormality is represented by that the original field association mode cannot be matched with the template association standard, and the comparison finds that the template association standard does not contain the adaptation rule of the target class association mode, the abnormality is judged to be from the imperfect conversion rule of the template, and a conversion rule optimization suggestion of the adaptation rule of the newly added target class association mode is generated; labeling the corresponding abnormal record number, the related meta-model template identification and the concrete steps of suggestion implementation for each generated optimization suggestion or adjustment suggestion.
  10. 10. The multi-source heterogeneous data standardization processing terminal based on the meta-model self-adaption matching is characterized by comprising the following components: One or more processors; a machine-readable storage medium storing one or more programs; The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the metamodel-adaptive matching-based multi-source heterogeneous data normalization processing method of any one of claims 1-9.

Description

Multi-source heterogeneous data standardized processing method and terminal based on meta-model self-adaptive matching Technical Field The invention belongs to the technical field of multi-source heterogeneous data standardization processing, and particularly relates to a multi-source heterogeneous data standardization processing method and a terminal based on meta-model self-adaptive matching. Background In the digital age today, data has become an important asset for business and social development. Along with the wide application of information technology, a large amount of multi-source heterogeneous data are generated under different systems and different business scenes, the data sources are wide, the information systems of various business departments inside enterprises, the data interfaces of external partners, public data on the Internet and the like are covered, and the data structures are different and comprise structured data (such as table data in a relational database), semi-structured data (such as XML (extensive markup language), data in JSON (Java Server) format) and unstructured data (such as texts, images, audios and the like). The existing multi-source heterogeneous data processing method mainly has the following problems: on one hand, the traditional data processing method often adopts a fixed meta-model template to process all data, and lacks of targeted adaptation to different data sources and acquisition scenes, so that a meta-model is not matched with an actual data structure, and characteristics and constraint conditions of the data cannot be accurately described, so that accuracy and effectiveness of data conversion are affected. On the other hand, in the data conversion process, an adaptive matching mechanism is lacking, and for complex and changeable multi-source heterogeneous data, the matching strategy is difficult to dynamically adjust according to the actual structural characteristics of the data, and the situation of matching errors or incomplete matching easily occurs, so that the quality of the generated standardized data is uneven. In addition, in the data processing process, the existing method has insufficient correlation analysis on the conversion process record and the standardized data, can not find problems and potential risks existing in the data processing process in time, is difficult to optimize and improve the meta model library according to actual conditions, and is not beneficial to continuous optimization and improvement of data processing. Disclosure of Invention Aiming at the defects of the prior art, the invention aims to provide a multi-source heterogeneous data standardized processing method and a terminal based on meta-model self-adaption matching, which effectively avoid the problem of data processing errors caused by mismatching of the meta-model and the data. In order to achieve the above purpose, the invention adopts the following technical scheme: the invention discloses a multi-source heterogeneous data standardized processing method based on meta-model self-adaptive matching, which comprises the following steps: acquiring a multi-source heterogeneous data set, wherein the multi-source heterogeneous data set comprises original data units which come from different data sources and have different data structures, and recording the data source attribute and the acquisition scene attribute of each original data unit; Based on the data source attribute and the acquisition scene attribute of the multi-source heterogeneous data set, performing meta-model pre-adaptation adjustment on a preset meta-model library to generate a pre-adaptation meta-model library, wherein the preset meta-model library comprises initial meta-model templates designed for different data structure types, and each initial meta-model template comprises data structure description information, data type constraint information and data conversion rule information; Extracting data structure characteristics of each original data unit in the multi-source heterogeneous data set, generating multi-dimensional structure characteristics of each original data unit, performing multi-round self-adaptive matching based on the multi-dimensional structure characteristics and meta-model templates in the pre-adaptive meta-model library, and generating meta-model matching results and matching optimization suggestions corresponding to each original data unit; calling data conversion rule information of a corresponding meta-model template according to the meta-model matching result, and carrying out dynamic standardized conversion on each original data unit by combining the matching optimization suggestion to generate standardized data units and conversion process records corresponding to each original data unit; And collecting standardized data units and conversion process records corresponding to all original data units, constructing a standardized data set, performing association analysis on the stan