Search

CN-122020568-A - Manifold reconstruction-based multi-source gradient data processing system

CN122020568ACN 122020568 ACN122020568 ACN 122020568ACN-122020568-A

Abstract

The invention relates to the technical field of computer bottom data processing and distributed feature operation, in particular to a multisource gradient data processing system based on manifold reconstruction; the method comprises a data and knowledge acquisition module, an ideal manifold reconstruction and simulation module, a double-track differential feature extraction module and a situation coupling judgment module, wherein the data and knowledge acquisition module is used for acquiring local model gradient update vectors uploaded by a multi-source data end, the knowledge base comprises typical mode features of gradient deviation, the ideal manifold reconstruction and simulation module is used for generating a gradient drift template and a gradient abnormal template, the double-track differential feature extraction module is used for respectively calculating theoretical drift deviation and theoretical abnormal deviation among the gradient drift template, the gradient abnormal template and a quantization feature matrix, and the situation coupling judgment module is used for executing trusted writing of multi-source gradient data or abnormal data rejection release or isolation storage instructions.

Inventors

  • HE YIFAN
  • WANG ZONGYUE

Assignees

  • 集美大学

Dates

Publication Date
20260512
Application Date
20260410

Claims (8)

  1. 1. A manifold reconstruction-based multi-source gradient data processing system, comprising: the data and knowledge acquisition module is used for receiving the gradient update vector of the multi-source input high-dimensional local model, and calling the global model parameters after the previous round of aggregation and a preset model abnormal mode knowledge base, wherein the knowledge base comprises typical mode characteristics of gradient deviation; The ideal manifold reconstruction and simulation module is used for performing forward reasoning and loss calculation on a preset trusted verification set based on the global model parameters, generating a floating point type theoretical optimal gradient direction vector in the current training stage, performing self-adaptive data quantization compression processing on the floating point type theoretical optimal gradient direction vector through a processor to extract a low-bit-width quantization characteristic matrix; The double-track differential feature extraction module is used for executing bottom layer tensor differential operation, calculating the actual deviation features between the local model gradient update vector and the quantization feature matrix, and respectively calculating the theoretical drift deviation and the theoretical abnormal deviation between the gradient drift template, the gradient abnormal template and the quantization feature matrix; And the situation coupling judgment module is used for calculating the first structural similarity between the actual deviation feature and the theoretical drift deviation and the second structural similarity between the actual deviation feature and the theoretical abnormal deviation in a high-dimensional vector space, and executing the credible writing of the multi-source gradient data or the abnormal data rejection release or isolation storage instruction according to the similarity matching result, thereby completing the filtering of the bottom layer feature data and the updating of the system memory.
  2. 2. The manifold reconstruction-based multi-source gradient data processing system of claim 1, wherein the ideal manifold reconstruction and simulation module generates a theoretical optimal gradient direction comprising: Inputting the trusted validation set into a neural network model constructed from the global model parameters; Calculating a loss function between the model output and the real label; and obtaining the gradient of the loss function on each parameter of the model through a back propagation algorithm to form the theoretical optimal gradient direction.
  3. 3. The manifold reconstruction-based multi-source gradient data processing system of claim 1, wherein the ideal manifold reconstruction and simulation module generates a gradient drift template and a gradient anomaly template, comprising: Invoking data non-independent same-distribution offset characteristics in the model abnormal mode knowledge base to generate a drift factor vector conforming to normal distribution offset; Vector synthesis is carried out on the drift factor vector and the theoretical optimal gradient direction, and the gradient drift template is generated; Invoking parameter back door disturbance characteristics or gradient amplitude abnormal characteristics in the model abnormal mode knowledge base to generate abnormal factor vectors with direction overturning or nonlinear amplification characteristics; and vector synthesis is carried out on the abnormal factor vector and the theoretical optimal gradient direction, so as to generate the gradient abnormal template.
  4. 4. The manifold reconstruction-based multi-source gradient data processing system of claim 1, wherein the dual-rail differential feature extraction module calculates a realistic bias feature comprising: performing element-by-element tensor difference operation on the local model gradient update vector and the quantization feature matrix; And taking the differential operation result vector as a realistic deviation feature reflecting deviation of the local calculation process of the multi-source data terminal from the expected optimization path.
  5. 5. The manifold reconstruction-based multi-source gradient data processing system of claim 1, wherein the dual-rail differential feature extraction module calculates a theoretical drift bias and a theoretical anomaly bias, comprising: Calculating tensor difference vectors of the gradient drift template and the quantization characteristic matrix, and defining the tensor difference vectors as theoretical drift deviation; and calculating tensor difference vectors of the gradient anomaly templates and the quantization characteristic matrix, and defining the tensor difference vectors as theoretical anomaly deviation.
  6. 6. The manifold reconstruction-based multi-source gradient data processing system of claim 1, wherein the situation coupling decision module performs the response based on the similarity matching result, comprising: If the first structural similarity is higher than a preset safety threshold and the second structural similarity is lower than the safety threshold, determining that a model update source is a trusted data offset; and incorporating the corresponding local model gradient update vector into the global model aggregation update process.
  7. 7. The manifold reconstruction-based multi-source gradient data processing system of claim 6, wherein the situation coupling decision module performs a response based on the similarity matching result, further comprising: If the second structural similarity is higher than the safety threshold, judging that the parameter level malicious disturbance exists in the update source of the model; The corresponding local model gradient update vector is rejected for global model aggregation and its source participant is marked as a low reliability node.
  8. 8. The manifold reconstruction-based multi-source gradient data processing system of claim 7, wherein the situation coupling decision module performs a response based on the similarity matching result, further comprising: if the first structural similarity and the second structural similarity are both lower than the safety threshold, determining that the model updating source is an undefined abnormal mode; and transferring the corresponding local model gradient update vector to an isolation analysis environment for further pattern recognition and attribution analysis.

Description

Manifold reconstruction-based multi-source gradient data processing system Technical Field The invention relates to the technical field of computer bottom data processing and distributed feature operation, in particular to a manifold reconstruction-based multi-source gradient data processing system. Background Under the current federal learning security defense scene, the server cannot directly audit the original training data of the participants, so that the global model is extremely easy to be interfered by various malicious attacks or benign data distribution differences in the aggregation process; the existing defense scheme is generally based on statistical outlier detection assumption, namely, gradient which deviates from a group mean value obviously is judged as attack by calculating Euclidean distance or cosine similarity between model updates of all the participants, however, under complex environments such as wireless network security situation awareness and the like, the participants often have serious non-independent same distribution characteristics, gradient deviation caused by normal business logic is highly similar to attack disturbance which is injected maliciously in statistical characteristics, and the traditional threshold judgment method is difficult to effectively distinguish goodwill drift and malicious attack under a black box scene due to lack of deep understanding of model evolution mechanism and attack mode knowledge, so that security detection has high false alarm rate and poor robustness; Therefore, how to realize accurate identification of model updating property by reconstruction of mathematical mechanism and comparison of multidimensional modes under the limitation of invisible data and improve the active defense capability of the system in a complex network environment becomes a technical problem to be solved. Disclosure of Invention In order to solve the technical problems, the invention provides a manifold reconstruction-based multi-source gradient data processing system, which specifically comprises the following technical scheme: The data and knowledge acquisition module is used for receiving the gradient update vector of the multi-source input high-dimensional local model, and calling the global model parameters after the previous round of aggregation and a preset model abnormal mode knowledge base, wherein the knowledge base comprises typical mode characteristics of gradient deviation; The ideal manifold reconstruction and simulation module is used for performing forward reasoning and loss calculation on a preset trusted verification set based on global model parameters, generating a floating point type theoretical optimal gradient direction vector in the current training stage, performing self-adaptive data quantization compression processing on the floating point type theoretical optimal gradient direction vector through a processor to extract a low-bit-width quantization characteristic matrix; The double-track differential feature extraction module is used for executing bottom layer tensor differential operation, calculating the actual deviation features between the local model gradient update vector and the quantized feature matrix, and calculating the theoretical drift deviation and the theoretical abnormal deviation among the gradient drift template, the gradient abnormal template and the quantized feature matrix respectively; And the situation coupling judgment module is used for calculating the first structural similarity between the actual deviation feature and the theoretical drift deviation and the second structural similarity between the actual deviation feature and the theoretical abnormal deviation in the high-dimensional vector space, and executing the trusted writing of the multi-source gradient data or the abnormal data rejection release or isolation storage instruction according to the similarity matching result, thereby completing the filtering of the bottom layer feature data and the updating of the system memory. Preferably, the ideal manifold reconstruction and simulation module generates a theoretical optimal gradient direction, including: inputting the trusted verification set into a neural network model constructed by global model parameters; Calculating a loss function between the model output and the real label; and obtaining the gradient of the loss function on each parameter of the model through a back propagation algorithm to form a theoretical optimal gradient direction. Preferably, the ideal manifold reconstruction and simulation module generates a gradient drift template and a gradient anomaly template, including: Invoking data non-independent same-distribution offset characteristics in a model abnormal mode knowledge base to generate a drift factor vector conforming to normal distribution offset; Vector synthesis is carried out on the drift factor vector and the theoretical optimal gradient direction, and a gradient drift template is g