Search

CN-121980549-A - Data flow path tracking method and system based on dynamic watermark

CN121980549ACN 121980549 ACN121980549 ACN 121980549ACN-121980549-A

Abstract

The invention relates to the technical field of dynamic watermarking, and discloses a data flow path tracking method and a system based on dynamic watermarking, wherein a dynamic watermarking signal capable of perturbation is injected into each link of a data flow path, so that watermarking continuously exists in the processes of data transmission, combination and aggregation; extracting watermark response through correlation detection and statistical modeling to construct a multi-dimensional path relation matrix, identifying path structure information formed by joint processing through tensor processing and spectrum analysis, and finally outputting a real circulation path of data according to an edge weight mapping and optimal path solving algorithm. The method can keep the integrity and verifiability of the tracking information in a complex data fusion environment, realize high-precision positioning of data sources and paths, and promote the intelligent level of enterprise data security management and credit risk monitoring.

Inventors

  • YE WUBIN
  • WU YUETING
  • CHEN XIAOQIAN
  • XIE FEI
  • HUANG JINHU

Assignees

  • 福建中信网安信息科技有限公司

Dates

Publication Date
20260505
Application Date
20260109

Claims (10)

  1. 1. The data flow path tracking method based on the dynamic watermark is characterized by comprising the following steps: For each circulation edge in the blood edge network, respectively injecting mutually independent additive spread spectrum watermarks for the multi-mode data; Watermark detection and covariance whitening are respectively carried out on samples formed by the combination of the multi-mode data, and whitening detection vectors for eliminating second-order statistics are obtained; performing element-by-element pairing multiplication on whitening detection vectors of different modes of each sample, constructing a path edge space vector reflecting the common participation of multiplications, and generating a third-order central moment tensor according to the path edge space vector; calculating tensor principal eigenvalues of third-order central moment tensors and corresponding principal eigenvectors thereof to characterize tensor peaks induced by the join operation; and extracting component modulus values of the main feature vector as intensity weights of the circulating edges, and planning a maximum product path in the blood edge network to output a data circulating path.
  2. 2. The method for tracking a data stream path based on dynamic watermarking according to claim 1, wherein, for each stream edge in a blood edge network, a mutually independent additive spread spectrum watermark is respectively injected for multi-mode data, comprising: For any one circulation edge in a blood-edge network, respectively constructing a structural modal spread spectrum watermark vector obeying zero-mean unit variance multi-element Gaussian distribution aiming at a structural characteristic mode, constructing a text embedded modal spread spectrum watermark vector obeying zero-mean unit variance multi-element Gaussian distribution aiming at a text embedded mode, and keeping the generated structural modal spread spectrum watermark vector and the text embedded modal spread spectrum watermark vector statistically independent; Obtaining a structured mode injection amplitude corresponding to a current circulation edge, calculating the product of the structured mode injection amplitude and the structured mode spread spectrum watermark vector, and performing vector addition operation on the product and an original structured feature vector to obtain a watermark-containing structured feature vector; And obtaining a text embedding mode injection amplitude corresponding to the current circulation edge, calculating the product of the text embedding mode injection amplitude and the text embedding mode spread spectrum watermark vector, and performing vector addition operation on the product and the original text embedding vector to obtain the watermark-containing text embedding vector.
  3. 3. The method for tracking a data stream path based on dynamic watermarking as claimed in claim 2, wherein watermark detection and covariance whitening are performed on samples formed by combining multi-modal data, respectively, to obtain a whitened detection vector for eliminating second order statistics, comprising: For any sample, respectively calculating the vector inner product of the watermark-containing structured feature vector of the sample and the structured modal spread watermark vector of each circulation edge in the blood edge network, forming the structured modal original detection vector according to the arrangement sequence of the circulation edges, respectively calculating the vector inner product of the watermark-containing text embedded vector of the sample and the text embedded modal spread watermark vector of each circulation edge in the blood edge network, and forming the text embedded modal original detection vector according to the arrangement sequence of the circulation edges.
  4. 4. A data flow path tracking method based on dynamic watermark according to claim 3, wherein watermark detection and covariance whitening are performed on samples formed by combining multi-modal data, respectively, to obtain a whitened detection vector for eliminating second order statistics, and further comprising: counting the original detection vectors of the structuring modes of all samples, and calculating the mean value vectors of the structuring modes and the covariance matrix of the structuring mode detection; Solving an inverse square root matrix for the structural modal detection covariance matrix to obtain a structural modal whitening matrix; The method comprises the steps of subtracting a structured modal mean value vector from a structured modal original detection vector of a sample, multiplying the structured modal whitening matrix to obtain a structured modal whitening detection vector, subtracting the text embedding modal mean value vector from a text embedding modal original detection vector of the sample, and multiplying the text embedding modal whitening matrix to obtain a text embedding modal whitening detection vector.
  5. 5. The method of claim 4, wherein performing element-wise pairing multiplication on whitening detection vectors of different modalities for each sample to construct a path-edge spatial vector reflecting multiplicative co-participation, comprises: for any sample, the structural modal whitening detection vector and the text embedding modal whitening detection vector are called, and the dimensions of the structural modal whitening detection vector and the text embedding modal whitening detection vector are the same and correspond to the number of circulating edges in a blood edge network; performing element-by-element multiplication operation on the structural modal whitening detection vector and the text embedding modal whitening detection vector, namely multiplying components of the two vectors at the same circulating edge position; And determining a product vector obtained by performing element-by-element multiplication as a path edge space vector of the sample.
  6. 6. The method of dynamic watermark-based data stream path tracing according to claim 5, wherein generating a third-order center moment tensor comprises: Vector accumulation is carried out on the path edge space vectors corresponding to all the samples, and vectors obtained through accumulation are divided by the total number of the samples to obtain path edge space average value vectors; subtracting the path edge space average value vector from the corresponding path edge space vector for each sample to obtain a centralized path edge space vector; calculating the sum of the spatial vectors of the edges of the centralized path and the outer product of the three tensors to obtain a single third-order tensor for the sample; And accumulating the monomer third-order tensors corresponding to all the samples, and dividing the result tensor obtained by accumulation by the total number of the samples to obtain the third-order central moment tensor.
  7. 7. The method for tracking a data stream path based on dynamic watermark according to claim 6, wherein calculating tensor principal eigenvalues of third-order center moment tensors and their corresponding principal eigenvectors comprises: constructing an optimization target aiming at a unit length candidate vector, wherein the optimization target is an inner product of a tensor obtained by calculating a third-order central moment tensor and performing three outer products on the candidate vector; under the constraint condition that the two norms of the candidate vector are limited to be one, solving an extremum solution enabling the optimization target to reach the maximum value; The maximum value of the optimization target is determined as a tensor principal eigenvalue, and the candidate vector corresponding to the maximum value is determined as a principal eigenvector.
  8. 8. The method for dynamic watermark-based data stream path tracking according to claim 7, wherein extracting component modulus values of a main feature vector as intensity weights of stream edges comprises: for each circulation edge in the blood edge network, extracting vector components corresponding to the circulation edge from the main feature vector; absolute value operation is carried out on the extracted vector components, and the numerical value obtained by the operation is determined as the flow edge intensity weight of the flow edge; And combining the circulating edge intensity weights of all circulating edges into a circulating edge intensity weight vector according to the arrangement sequence of the circulating edges in the blood edge network.
  9. 9. The dynamic watermark-based data flow path tracking method according to claim 8, wherein planning a maximum product path in a blood edge network to output a data flow path comprises: Establishing a path search target in a blood edge network, wherein the path search target is a path which screens out the continuous product of the intensity weights of the circulating edges contained in the circulating edges to reach the maximum value; Traversing the nodes according to the topological sequence of the blood edge network, aiming at the current node, acquiring the accumulated path intensities of all the precursor nodes, respectively calculating the products of the accumulated path intensities of the precursor nodes and the circulating edge intensity weights of the circulating edges connecting the precursor nodes to the current node, and selecting the maximum value in the products as the accumulated path intensity of the current node; And (3) starting from a destination node of the blood edge network, performing backward backtracking according to the accumulated path intensity of each node, determining a circulation edge sequence forming the maximum continuous product, and outputting the circulation edge sequence as a data circulation path.
  10. 10. A dynamic watermark-based data stream path tracking system, applied to the dynamic watermark-based data stream path tracking method as set forth in any one of claims 1 to 9, comprising: The data injection module is used for injecting mutually independent additive spread spectrum watermarks for the multi-mode data respectively aiming at each circulating edge in the blood edge network; The data whitening module is used for respectively performing watermark detection and covariance whitening on samples formed by the combination of the multi-mode data to obtain whitening detection vectors for eliminating second-order statistics; The central moment tensor construction module is used for executing element-by-element pairing product on whitening detection vectors of different modes of each sample, constructing a path edge space vector reflecting the common participation of multiplications and generating a third-order central moment tensor according to the path edge space vector; the main feature extraction module calculates tensor main feature values of third-order central moment tensors and corresponding main feature vectors thereof so as to represent tensor peaks induced by the connection operation; And the data flow path acquisition module is used for extracting component modulus values of the main feature vector as intensity weights of flow edges and planning a maximum product path in the blood edge network to output a data flow path.

Description

Data flow path tracking method and system based on dynamic watermark Technical Field The invention relates to the technical field of dynamic watermarking, in particular to a data flow path tracking method and system based on dynamic watermarking. Background With the perfection of data governance systems and the popularization of cross-domain data sharing, data is not limited to a single source or a single system, but is transferred in a multi-layer structured form among different service loops. For example, enterprise credit risk analysis typically requires the simultaneous processing of financial statements, transaction streams, external news public opinion, etc., which, after aggregation, are cleaned, correlated, aggregated, counted, or modeled multiple times to form a complex multi-path data processing graph. In this process, a single static watermark cannot retain its effectiveness because the embedded mark of the data is deformed, superimposed, or even partially vanished multiple times when it is recalculated or joined, resulting in the mark of the original path being masked. Traditional tracing technologies that rely on hash signatures, unique identifications, or static fingerprints cannot correctly reflect the actual flow trace of data in the face of such dynamic, multi-stage data changes. In particular, there are a number of multi-source join operations in the data streaming process. Each joint processing introduces signal components of multiple sources into the result so that the downstream data simultaneously contains characteristic patterns from different upstream nodes, and the joint operations are equivalent to multiple multiplication or cross mapping of signals of the sources at the calculation level, thereby forming complex nonlinear relations in the result. The processing mechanism breaks the separability of the traditional watermark in a single signal superposition environment, so that a downstream system can only obtain a fuzzy superposition result when detecting the watermark, and a specific path cannot be distinguished. At the same time, the ambiguity is further exacerbated by the joint operation of different modality data (e.g., the combined analysis of numerical tables and text representations), because the statistical modes of the modalities when combined differ significantly, resulting in the original markers exhibiting an unstable response after fusion. Finally, the traceability of the data flow path is gradually weakened in the multi-joint processing, and a technical bottleneck with unknown sources and difficult path identification is formed. Disclosure of Invention The invention provides a data flow path tracking method and a system based on dynamic watermarking, which solve the technical problems in the background technology. In a first aspect, a method for tracking a data stream path based on dynamic watermarking includes: For each circulation edge in the blood edge network, respectively injecting mutually independent additive spread spectrum watermarks for the multi-mode data; Watermark detection and covariance whitening are respectively carried out on samples formed by the combination of the multi-mode data, and whitening detection vectors for eliminating second-order statistics are obtained; performing element-by-element pairing multiplication on whitening detection vectors of different modes of each sample, constructing a path edge space vector reflecting the common participation of multiplications, and generating a third-order central moment tensor according to the path edge space vector; calculating tensor principal eigenvalues of third-order central moment tensors and corresponding principal eigenvectors thereof to characterize tensor peaks induced by the join operation; and extracting component modulus values of the main feature vector as intensity weights of the circulating edges, and planning a maximum product path in the blood edge network to output a data circulating path. In a second aspect, a dynamic watermark-based data stream path tracking system is applied to any one of the dynamic watermark-based data stream path tracking methods, and includes: The data injection module is used for injecting mutually independent additive spread spectrum watermarks for the multi-mode data respectively aiming at each circulating edge in the blood edge network; The data whitening module is used for respectively performing watermark detection and covariance whitening on samples formed by the combination of the multi-mode data to obtain whitening detection vectors for eliminating second-order statistics; The central moment tensor construction module is used for executing element-by-element pairing product on whitening detection vectors of different modes of each sample, constructing a path edge space vector reflecting the common participation of multiplications and generating a third-order central moment tensor according to the path edge space vector; the main feature extraction mod