CN-121996721-A - Multi-source data standardized processing method and system based on ERP
Abstract
The invention relates to the technical field of enterprise informatization management and multi-source data integration, and discloses a multi-source data standardization processing method and system based on ERP, wherein the method comprises the steps of obtaining an original data set of a purchasing system and an original data set of an inventory system to generate a standardization data set; determining a field attribute mapping relation between a purchase field and an inventory field, constructing a unified dimension conversion model to calculate conversion factors to obtain a standardized data set, checking the standardized data set to generate a qualified data set, constructing a service time sequence list based on the standardized data set to obtain an abnormality detection result, and generating final integrated data according to the abnormality detection result. The method can realize semantic alignment and dimension unification of multi-source data and improve data consistency and decision accuracy.
Inventors
- YAO KUN
- LIN SHENGBIN
- CAI BIN
Assignees
- 靖江同丰商务科技有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260128
Claims (8)
- 1. The multi-source data standardization processing method based on ERP is characterized by comprising the following steps: Acquiring an original data set of a purchasing system and an original data set of an inventory system, and preprocessing to generate a normalized data set; Performing field name matching on the normalized data set by combining a preset field rule base to obtain rule matching candidate pairs, converting the rule matching candidate pairs by combining a pre-trained semantic feature extraction model, and determining a field attribute mapping relation between a purchase field and an inventory field; Identifying a difference material attribute field with dimension difference in the normalized data set based on the field attribute mapping relation, calculating the difference material attribute field by utilizing a pre-constructed unified dimension conversion model to obtain a conversion factor, and executing numerical normalization processing by utilizing the conversion factor to obtain a normalized data set; traversing the numerical field in the standardized data set, determining out-of-range abnormal records exceeding a preset business rationality threshold interval, and carrying out logic verification on the out-of-range abnormal records to generate a qualified data set; Extracting time fields from the qualified data set, sequencing to obtain a service time sequence list, traversing the service time sequence list, determining a time sequence breakpoint with adjacent time difference larger than a preset service interruption threshold value, calculating the numerical value fields according to a preset statistical distribution criterion, determining a numerical value outlier, and summarizing the time sequence breakpoint and the numerical value outlier to generate an abnormality detection result; And correcting and complementing the data record corresponding to the abnormal detection result to obtain a corrected data record, and packaging the data applicability score and the corrected data record to generate final integrated data.
- 2. The ERP-based multi-source data standardization processing method of claim 1, wherein the obtaining the purchasing system original data set and the inventory system original data set, and preprocessing, generating the standardized data set, comprises: Respectively reading the purchasing system original data set and the inventory system original data set through a preset data interface, and analyzing and extracting a time field, a numerical value field and a text description field in the purchasing system original data set and the inventory system original data set; Calling a preset standard time format template, performing format reconstruction operation on the time field, and uniformly converting date and time data of different sources into standard time stamp data; performing regular analysis on the numerical field, separating and extracting the non-digital text suffix contained in the numerical field, and storing the non-digital text suffix as independent unit metadata; performing non-digital character filtering processing on the numerical type field after separating the unit metadata, and forcibly converting the processed data into a double-precision floating point number type; And executing full empty value scanning on the purchasing system original data set and the inventory system original data set, and if the empty value exists in the numerical field, filling by using a preset default zero value or historical mean value, and generating a normalized data set by combining the standard timestamp data and the unit metadata.
- 3. The ERP-based multi-source data standardization processing method of claim 1, wherein the performing field name matching on the standardized dataset in combination with a preset field rule base to obtain rule matching candidate pairs, and converting the rule matching candidate pairs in combination with a pre-trained semantic feature extraction model to determine a field attribute mapping relationship between a purchase field and an inventory field comprises: analyzing metadata of the normalized dataset, extracting a field name text and a field description annotation text, and constructing a field feature set to be matched; traversing the field feature set to be matched, and utilizing the preset field rule base to perform keyword retrieval and synonym comparison to generate rule matching candidate pairs; inputting the field name text and the field description annotation text in the field feature set to be matched into a pre-trained semantic feature extraction model, and converting the pre-trained semantic feature extraction model into a high-dimensional semantic feature vector; Calculating cosine similarity values of the semantic feature vectors between the purchasing system and the inventory system, and constructing a full-quantity semantic similarity matrix; And screening field pairs with cosine similarity values larger than a preset semantic judgment threshold value from the full semantic similarity matrix, merging the field pairs with the rule matching candidate pairs in a set and removing conflict and duplication, and determining a final field attribute mapping relation.
- 4. The ERP-based multi-source data standardization processing method of claim 2, wherein the calculating the difference material attribute field using a pre-constructed unified dimension conversion model to obtain a conversion factor includes: Constructing a unified dimension conversion model, wherein a preset standard physical unit conversion table is integrated in the unified dimension conversion model, and an associated interface of preset material specification main data is configured; Analyzing unit metadata corresponding to the difference material attribute field, and extracting a source unit and a target unit; inputting the source unit and the target unit into the unified dimension conversion model, and directly calling the preset standard physical unit conversion table to obtain a numerical multiplier if the source unit and the target unit belong to standard physical unit differences, and determining the numerical multiplier as a conversion factor; if the difference belongs to the service packaging unit difference, inquiring the preset material specification main data through the association interface, calculating a proportionality coefficient between a source unit and a target unit, and determining the proportionality coefficient as a conversion factor.
- 5. The ERP-based multi-source data standardization processing method of claim 2, wherein traversing the numeric field in the standardization data set, determining an out-of-range anomaly record exceeding a preset business rationality threshold interval, and performing a logical check on the out-of-range anomaly record, generating a qualified data set, comprising: traversing a numerical field in the standardized data set, comparing the numerical field with a preset business rationality threshold interval, and marking out-of-range abnormal records exceeding the preset business rationality threshold interval; Extracting associated field combinations in each out-of-limit abnormal record, executing dependency check operation based on a preset algebraic equation, and marking the record as a logic error record if the check fails; Extracting a service main key field, an original certificate number field and a transaction type field in the logic error record, combining and constructing a unique characteristic key value, and identifying a complete repeated record group with the same unique characteristic key value; and reserving the record with the latest time indicated by the standard timestamp data in the complete repeated record group, removing the rest redundant copies, and generating the qualified data set by combining the rest data after removing the out-of-range abnormal record and the logic error record.
- 6. The method for standardized processing of multi-source data based on ERP of claim 1, wherein the steps of extracting time fields from the qualified data set and sorting to obtain a service time sequence list, traversing the service time sequence list, determining a time sequence breakpoint with adjacent time difference larger than a preset service interruption threshold value, calculating the numerical value fields according to a preset statistical distribution criterion, determining a numerical value outlier, summarizing the time sequence breakpoint and the numerical value outlier, and generating an abnormality detection result comprise the steps of: Extracting a purchasing and warehousing time field and an inventory change time field from the qualified data set, merging and sorting according to a time sequence, and constructing a service time sequence list; Traversing the service time sequence list, calculating a time difference value between two adjacent time nodes, and marking a corresponding time period as a time sequence breakpoint if the time difference value is larger than a preset service interruption threshold value; Calculating the statistical mean and standard deviation of the numerical field appointed in the qualified data set, constructing a distribution boundary based on a preset statistical distribution criterion, identifying a specific numerical value exceeding the distribution boundary in the numerical field, and marking the specific numerical value as a numerical value outlier; And summarizing the record indexes and the characteristic information of the time sequence break points and the numerical value outlier points to generate the abnormality detection result.
- 7. The ERP-based multi-source data standardization processing method of claim 1, wherein the calculating the data applicability score of the qualified data set according to the anomaly detection result, correcting and complementing the data record corresponding to the anomaly detection result to obtain a corrected data record, and packaging the data applicability score and the corrected data record to generate final integrated data comprises: Counting the quantity ratio and the deviation amplitude of time sequence break points and numerical outliers in the abnormal detection result, and calculating to obtain the data applicability score of the qualified data set by combining a preset quality weight factor; Aiming at the record marked as the numerical value outlier, performing numerical value smoothing correction processing by using a preset moving average algorithm or a linear interpolation algorithm to generate a corrected data record; executing zero value completion processing or null value occupation processing for the time period marked as the time sequence breakpoint, and repairing the continuity loss of the time sequence; and packaging the data applicability score serving as a quality metadata tag with the processed data record to generate final integrated data for subsequent inventory early warning and purchasing decision support.
- 8. An ERP-based multi-source data standardized processing system, comprising: The preprocessing module is used for acquiring an original data set of the purchasing system and an original data set of the inventory system, preprocessing the original data sets and generating a normalized data set; The mapping module is used for carrying out field name matching on the normalized data set by combining a preset field rule base to obtain rule matching candidate pairs, converting the rule matching candidate pairs by combining a pre-trained semantic feature extraction model, and determining a field attribute mapping relation between a purchase field and an inventory field; The conversion module is used for identifying the difference material attribute fields with dimension differences in the normalized data set based on the field attribute mapping relation, calculating the difference material attribute fields by utilizing a pre-constructed unified dimension conversion model to obtain conversion factors, and executing numerical normalization processing by utilizing the conversion factors to obtain a normalized data set; the verification module is used for traversing the numerical value type field in the standardized data set, determining out-of-range abnormal records exceeding a preset business rationality threshold interval, and carrying out logic verification on the out-of-range abnormal records to generate a qualified data set; The detection module is used for extracting time fields from the qualified data set and sequencing to obtain a service time sequence list, traversing the service time sequence list, determining a time sequence breakpoint with adjacent time difference larger than a preset service interruption threshold value, calculating the numerical value fields according to a preset statistical distribution criterion, determining a numerical value outlier, and summarizing the time sequence breakpoint and the numerical value outlier to generate an abnormal detection result; And the correction module is used for calculating the data applicability score of the qualified data set according to the abnormal detection result, correcting and complementing the data record corresponding to the abnormal detection result to obtain a corrected data record, and packaging the data applicability score and the corrected data record to generate final integrated data.
Description
Multi-source data standardized processing method and system based on ERP Technical Field The invention relates to the technical field of enterprise informatization management and multi-source data integration, in particular to a multi-source data standardized processing method and system based on ERP. Background Currently, modern enterprises generally use an ERP (enterprise resource planning) system as a core, and simultaneously, external systems such as a supply chain management platform, an e-commerce transaction platform, and an internet of things device are operated in parallel. In order to break the information island and realize business collaboration, high-efficiency big data processing and standardized integration are carried out on massive heterogeneous data of cross-platform and cross-system, and the method has become a key link of enterprise informatization construction. In one prior art, conventional ETL (extraction, transformation, loading) tools or rule-based middleware are typically employed for multi-source data integration. These methods rely primarily on predefined static rule bases or simple string matching algorithms to perform preliminary cleansing, format conversion (e.g., date format unification) and basic null filtering of the raw data. The system generally assumes that the fields in the data tables with different sources have the same business meaning if the field names are similar or the data types are the same, and establishes a direct mapping relation to perform merging storage or report display according to the field names. However, the above prior art has significant limitations in facing high-dimensional and non-uniform complex business scenarios. Because of the deep semantic differences between different systems describing the same business concept (e.g. "sales" may refer to a pre-modus amount in ERP and may refer to a total amount of shipping in an e-commerce platform), a non-standardized representation of physical dimensions (e.g. "bin" and "bin", respectively) is difficult to learn by simple rule matching. The prior art lacks a mechanism capable of deeply analyzing field semantic association and dynamically eliminating dimension gap, and often introduces deviation when a mapping relation is established. Therefore, the technical problems of semantic dislocation and numerical distortion generated during cross-system data integration in the prior art are solved, and the data consistency and the decision accuracy are seriously affected. Disclosure of Invention The invention provides a multi-source data standardized processing method and system based on ERP (Enterprise resource planning), which are used for solving the technical problems that semantic dislocation and numerical distortion are generated during cross-system data integration in the prior art, and further data consistency and decision accuracy are seriously affected. In order to solve the technical problems, the present invention provides a method for standardized processing of multi-source data based on ERP, including: Acquiring an original data set of a purchasing system and an original data set of an inventory system, and preprocessing to generate a normalized data set; Performing field name matching on the normalized data set by combining a preset field rule base to obtain rule matching candidate pairs, converting the rule matching candidate pairs by combining a pre-trained semantic feature extraction model, and determining a field attribute mapping relation between a purchase field and an inventory field; Identifying a difference material attribute field with dimension difference in the normalized data set based on the field attribute mapping relation, calculating the difference material attribute field by utilizing a pre-constructed unified dimension conversion model to obtain a conversion factor, and executing numerical normalization processing by utilizing the conversion factor to obtain a normalized data set; traversing the numerical field in the standardized data set, determining out-of-range abnormal records exceeding a preset business rationality threshold interval, and carrying out logic verification on the out-of-range abnormal records to generate a qualified data set; Extracting time fields from the qualified data set, sequencing to obtain a service time sequence list, traversing the service time sequence list, determining a time sequence breakpoint with adjacent time difference larger than a preset service interruption threshold value, calculating the numerical value fields according to a preset statistical distribution criterion, determining a numerical value outlier, and summarizing the time sequence breakpoint and the numerical value outlier to generate an abnormality detection result; And correcting and complementing the data record corresponding to the abnormal detection result to obtain a corrected data record, and packaging the data applicability score and the corrected data record to generate