CN-122019750-A - Supply chain data real-time integration processing method and system based on multi-source heterogeneous data

CN122019750ACN 122019750 ACN122019750 ACN 122019750ACN-122019750-A

Abstract

The invention relates to the technical field of industrial data management, in particular to a method and a system for real-time integration processing of supply chain data based on multi-source heterogeneous data, which are used for acquiring an external compliance data document, calling a document layout analysis model based on a multi-modal converter, extracting text semantic vectors, fusing two-dimensional space coordinate features to generate a composite index key, executing cluster analysis to identify text blocks and generate structured metadata, discretizing a business transaction data stream to construct a related database taking production environment and logistics data as attributes, executing cross-modal entity alignment by utilizing the structured metadata, establishing a bidirectional pointer link between an unstructured document index and the related database, receiving a query vector, executing a recursive traversal algorithm based on the bidirectional pointer link, calculating cascade influence probability as a relevance sorting score and outputting a retrieval result, and realizing deep integration and risk linkage retrieval of the supply chain heterogeneous data through multi-modal feature fusion and cross-modal entity alignment.

Inventors

SHU WENBING
SHU WENJIE

Assignees

苏州金智源技术有限公司

Dates

Publication Date: 20260512
Application Date: 20260121

Claims (8)

1. The method and the system for real-time integration processing of supply chain data based on multi-source heterogeneous data are characterized by comprising the following steps: acquiring business transaction data flow, production environment data, logistics track data and external compliance data documents; Extracting semantic vector features of texts and two-dimensional space coordinate features in the documents, fusing to generate composite index keys, performing cluster analysis based on the composite index keys, identifying aggregated text blocks in the documents, mapping information in the aggregated text blocks into standard attribute fields, and generating structured metadata; performing discretization processing on the business transaction data stream, extracting a unique identifier as an index primary key, and constructing an associated database taking production environment data and logistics track data as associated attributes; Receiving a query vector, executing a recursive traversal algorithm along a data dependent path based on the bi-directional pointer link, and performing node matching and threshold verification in a retrieval path; and calculating cascade influence probability as a relevance ranking score and outputting a search result.
2. The method and system for real-time integration processing of supply chain data based on multi-source heterogeneous data according to claim 1, wherein the invoking the document layout analysis model based on the multi-modal converter comprises: The method comprises the steps of adopting a double-flow neural network architecture comprising a visual feature extraction branch and a text semantic extraction branch, carrying out convolution operation on a document image by utilizing the visual feature extraction branch, capturing visual texture features of a table wire frame, paragraph spacing and a seal position of a document, carrying out word embedding processing on a character sequence in the document by utilizing the text semantic extraction branch, capturing language logic features of a context, inputting the visual texture features and the language logic features into a cross attention module, carrying out feature alignment operation, calculating a correlation weight matrix of a visual region and text content to obtain a document structure diagram, and dividing and positioning unstructured information.
3. The method and system for real-time integration processing of supply chain data based on multi-source heterogeneous data according to claim 1, wherein the step of generating the composite index key by fusion specifically comprises: According to a preset field importance rule, dynamically distributing the fusion coefficient of the semantic features and the spatial features, giving higher semantic weight to key numerical fields including contract amount and date, giving higher spatial weight to position sensitive fields including signature and header footer, combining the semantic features and the spatial features in a weighted splicing mode, and carrying out dimension reduction processing through a full-connection layer to obtain the composite index key.
4. The method and system for real-time integration processing of supply chain data based on multi-source heterogeneous data according to claim 1, wherein the step of performing cluster analysis based on the compound index key specifically comprises: The method comprises the steps of adopting a density-based self-adaptive spatial clustering algorithm, taking a composite index key as input, calculating a distance metric value between text blocks in a document, merging text blocks with a distance smaller than a preset threshold value into the same semantic cluster, identifying a header area, a text area and a table data area, adopting grid cutting based on row-column projection for the identified table data area, separating independent cell contents, calculating a similarity score between each semantic cluster content and standard attribute field definition description by utilizing a pre-trained semantic matching model, selecting an attribute corresponding to the highest score as a mapping target, and generating standardized key value pair data.
5. The method and system for real-time integration processing of supply chain data based on heterogeneous multi-source data according to claim 1, wherein the discretizing the business transaction data stream, extracting a unique identifier as an index key, and constructing an association database with production environment data and logistics track data as association attributes comprises: Constructing a data alignment mechanism based on a space-time sliding window, setting a dynamic time window covering a production period and a logistics period by taking an order list number in a business transaction data stream as a reference anchor point, resampling and interpolating production environment data in the dynamic time window, filling a time breakpoint caused by a sensor fault, carrying out map matching and denoising processing on logistics track data, correcting a drifting positioning coordinate, mounting a processed continuous production state curve and a processed discrete logistics track point under the order list number according to a time stamp sequence, and constructing a hierarchical data storage structure to obtain a correlation database.
6. The method and system for real-time integration processing of supply chain data based on multi-source heterogeneous data according to claim 1, wherein the step of establishing a bi-directional pointer link between an unstructured document index and an associated database specifically comprises: the unstructured document index is a document feature index constructed based on the composite index key; Based on the graph neural network, respectively constructing entity objects in the association database and extraction entities in the structured metadata as nodes in the graph network, constructing attribute similarity, co-occurrence relation and business logic dependence among the nodes as edges in the graph network, propagating node characteristics through multi-layer graph convolution operation, calculating matching probability scores among heterogeneous nodes, and generating bidirectional pointer links between the two nodes when the scores exceed a confidence threshold.
7. The method and system for real-time integration processing of supply chain data based on heterogeneous multi-source data according to claim 1, wherein the steps of receiving a query vector and outputting a search result specifically include: Adopting a mixed traversal strategy combining depth-first search and breadth-first search with pruning strategies, taking an initial node hit by a received query vector as a starting point, diffusing the initial node to upstream and downstream nodes along a bi-directional pointer link, calculating a correlation scoring coefficient of a current path node and query intention in real time in each stage of jump process, immediately stopping searching of the current branch to execute pruning operation once the coefficient is lower than a preset cut-off threshold, synchronously checking whether the path node meets a user permission check rule, shielding nodes without access, and acquiring a retrieval subgraph containing direct associated data and indirect potential influence data; And establishing a Bayesian network risk propagation model based on the search subgraph, mapping each node on a path into variable nodes in a network, constructing a conditional probability relation among the nodes according to historical interrupt data, deducing by using a query event trigger probability, calculating posterior probability causing downstream delivery abnormality as cascade influence probability, weighting and fusing the probability and semantic similarity to generate a comprehensive score, and outputting a search result containing business risk priority according to the score.
8. A supply chain data real-time integrated processing system based on multi-source heterogeneous data, the system comprising: the data acquisition module is used for acquiring business transaction data flow, production environment data, logistics track data and external compliance data documents; The fusion clustering module is used for calling a document layout analysis model based on a multi-mode converter aiming at the external compliance data document, extracting semantic vector features of texts and two-dimensional space coordinate features in the document, and fusing to generate a composite index key; The database construction module is used for carrying out discretization processing on the business transaction data stream, extracting a unique identifier as an index primary key, and constructing an associated database taking production environment data and logistics track data as associated attributes; the query module is used for receiving a query vector, executing a recursive traversal algorithm along a data dependence path based on the bi-directional pointer link, performing node matching and threshold verification in a retrieval path, calculating cascade influence probability as a relevance ranking score and outputting a retrieval result.

Description

Supply chain data real-time integration processing method and system based on multi-source heterogeneous data Technical Field The invention relates to the technical field of industrial data management, in particular to a method and a system for real-time integration processing of supply chain data based on multi-source heterogeneous data. Background In the prior art, supply chain data management faces the challenge that multisource heterogeneous data is difficult to integrate efficiently, the supply chain relates to a large number of business transaction records, real-time production logistics sensing data and unstructured contract and compliance documents, and a traditional data retrieval and management system generally stores structured data in isolation from unstructured documents and lacks a unified semantic understanding and association mechanism. For unstructured document processing, layout visual features and spatial layout information are often ignored, so that key information extraction is inaccurate. In addition, when cross-modal data is retrieved, the prior art mainly relies on shallow keyword matching, lacks entity alignment and bidirectional linking capability based on business logic, cannot deeply traverse along a data-dependent path, and is more difficult to quantitatively evaluate cascade influence of single node abnormality on the downstream of the whole supply chain, so that retrieval results lack of association depth and risk early warning value. In order to solve the problem that unstructured documents and structured business flow data are difficult to deeply fuse in a supply chain scene, a method and a system for integrating and processing supply chain data in real time based on multi-source heterogeneous data are provided. Disclosure of Invention The invention aims to provide a supply chain data real-time integration processing method and system based on multi-source heterogeneous data, which aim to solve the problem that unstructured documents and structured business flow data are difficult to deeply fuse in a supply chain scene, and realize efficient integration and risk association retrieval of the multi-source heterogeneous data by establishing cross-modal bidirectional pointer links and cascade influence probability calculation. A method and a system for real-time integration processing of supply chain data based on multi-source heterogeneous data comprise the following steps: acquiring business transaction data flow, production environment data, logistics track data and external compliance data documents; Extracting semantic vector features of texts and two-dimensional space coordinate features in the documents, fusing to generate composite index keys, performing cluster analysis based on the composite index keys, identifying aggregated text blocks in the documents, mapping information in the aggregated text blocks into standard attribute fields, and generating structured metadata; performing discretization processing on the business transaction data stream, extracting a unique identifier as an index primary key, and constructing an associated database taking production environment data and logistics track data as associated attributes; Receiving a query vector, executing a recursive traversal algorithm along a data dependent path based on the bi-directional pointer link, and performing node matching and threshold verification in a retrieval path; and calculating cascade influence probability as a relevance ranking score and outputting a search result. Preferably, the invoking the document layout analysis model based on the multimodal transformer includes: The method comprises the steps of adopting a double-flow neural network architecture comprising a visual feature extraction branch and a text semantic extraction branch, carrying out convolution operation on a document image by utilizing the visual feature extraction branch, capturing visual texture features of a table wire frame, paragraph spacing and a seal position of a document, carrying out word embedding processing on a character sequence in the document by utilizing the text semantic extraction branch, capturing language logic features of a context, inputting the visual texture features and the language logic features into a cross attention module, carrying out feature alignment operation, calculating a correlation weight matrix of a visual region and text content to obtain a document structure diagram, and dividing and positioning unstructured information. Preferably, the step of generating the composite index key by fusion specifically includes: According to a preset field importance rule, dynamically distributing the fusion coefficient of the semantic features and the spatial features, giving higher semantic weight to key numerical fields including contract amount and date, giving higher spatial weight to position sensitive fields including signature and header footer, combining the semantic features and the spatial f