CN-121980586-A - Data acquisition method and system based on self-adaptive camouflage and authority simulation

CN121980586ACN 121980586 ACN121980586 ACN 121980586ACN-121980586-A

Abstract

The invention relates to the technical field of data acquisition and discloses a data acquisition method and a system based on self-adaptive camouflage and authority simulation, wherein the method comprises the steps of performing resistance exploration on a target data source, positioning a critical vulnerability area, constructing a joint optimization target by taking target data as task guidance, optimizing a reference request through an iterative anti-disturbance algorithm based on the joint optimization target, generating an anti-resistance data request, constructing an authority load of a simulation area authority verification logic based on the critical vulnerability area, fusing the authority load with the anti-resistance data request to form a composite request, executing the composite request to acquire original data, and reconstructing and purifying the original data based on clean-noisy example data pairs acquired from the target data source to obtain structured target data; the invention can improve the data acquisition efficiency based on self-adaptive camouflage and authority simulation.

Inventors

YIN GUANGQIANG
CHEN QING
WEN PENG
LI YE

Assignees

喀什地区电子信息产业技术研究院

Dates

Publication Date: 20260505
Application Date: 20260120

Claims (10)

1. A data acquisition method based on adaptive camouflage and authority simulation, the method comprising: s1, performing resistance exploration on a target data source, and analyzing the dependency of authority check logic on a request element to locate a key vulnerability area; S2, constructing a joint optimization target by taking the target data source as task guidance and synchronously optimizing the consistency of the task success degree and the request behavior and the normal mode; S3, optimizing the reference request through an iterative countermeasure disturbance algorithm based on the combined optimization target, and generating a countermeasure data request of self-adaptive disguise; S4, constructing a permission load of a simulation area permission verification logic based on the key vulnerability area, and fusing the permission load with the antagonistic data request to form a compound request; S5, executing the compound request to acquire original data, and reconstructing and purifying the original data through context learning based on the clean-noisy example data pair acquired from the target data source to acquire structured target data.
2. The method for obtaining data based on adaptive masquerading and permission simulation according to claim 1, wherein said performing an antagonistic investigation on a target data source, analyzing the dependency of permission check logic on a request element to locate a critical vulnerability region, comprises: Constructing a benchmark request with a compliance format and semantics based on the normal interaction mode of the target data source; Defining a disturbance area of a request element to be probed in the reference request based on a request forming part possibly focused by rights verification logic; respectively applying simulation disturbance against noise to each request element disturbance area to generate a post-disturbance request; based on the joint evaluation index and the post-disturbance request, the dependence degree of the authority verification logic on each request element disturbance area is evaluated respectively to obtain a dependence degree evaluation value corresponding to the disturbance area, wherein the core calculation formula of the joint evaluation index is as follows: ; Where F is the dependency evaluation value for a disturbance area, L p (delta) is the disturbance loss of the authority verification result after the disturbance delta is applied, L c (delta) is the difference loss of the disturbance request and the reference request in logic consistency, Loss of weight coefficients for disturbances For the weight coefficient of the difference loss, V perturbed represents an output vector obtained by the same authority check logic again after disturbance delta is applied to a designated area of the reference request, V base represents an output vector obtained by the authority check logic of the reference request, and the key vulnerability area is determined based on the dependency evaluation value of the completion descending order.
3. The method for obtaining data based on adaptive masquerading and authority simulation according to claim 2, wherein the defining a disturbance area of a request element to be probed in the reference request includes: The disturbance area under the attack mode corresponding to the four total scenes is defined as follows: a request parameter attack mode, which is to define a service parameter or inquiry condition area in the reference request as a disturbance area and record the disturbance area as a service parameter area; Defining an identity token, a permission claim or an access control label area in the reference request as a disturbance area and recording the disturbance area as a permission claim area; a composite declaration attack mode, namely defining the service parameter area and the authority declaration area as disturbance areas at the same time, and recording the disturbance areas as composite declaration areas; and (4) a request metadata attack mode, namely defining a protocol header, a resource path or a session identification area of the reference request as a disturbance area and recording the disturbance area as a request metadata area.
4. The data acquisition method based on self-adaptive camouflage and authority simulation as claimed in claim 1, wherein the constructing a joint optimization target by synchronously optimizing the task success degree and the consistency of the request behavior and the normal mode by taking the target data source as a task guide comprises the following steps: Defining a joint loss function based on the reference request to be optimized and the target data source, wherein the mathematical expression of the joint loss function is as follows: ; In the formula, In order to jointly optimize the target loss value, In order for the task to be lost, For attention loss, X' is the data request after application of the resistive disturbance E, y is the request result state of the target data, E is the resistive disturbance to be optimized, To measure the difference loss function, f (X') is a data acquisition prediction function, For balancing weight coefficients, L is the total number of layers, L is the layer index, H is the total number of attention headers per layer, H is the attention header index, i is the element index in the request where a disturbance can be applied, D l is the set of elements in the first layer where a disturbance can be received, Q i is the set of key vulnerability region elements associated with disturbance element i, j is the element index of the key vulnerability region, For attention weight, w is a weight vector, In order to vector the function of the vector, Is an offset scalar.
5. The method for data acquisition based on adaptive camouflage and authority simulation of claim 4, wherein the mathematical expression of the metric difference loss function is as follows: ; In the formula, In order to achieve this, the first and second, For the request result state of the target data, As a function of the Sigmoid, A predictive function is obtained for the data.
6. The method for obtaining data based on adaptive masquerading and authority simulation according to claim 1, wherein the generating the adaptive masquerading challenge data request by optimizing the reference request through an iterative challenge disturbance algorithm based on the joint optimization objective comprises: Initializing a zero value with the same dimension as the modifiable portion of the request to counter the disturbance, starting with the reference request; based on the combined optimization target, carrying out iterative optimization on the zero-value anti-disturbance by an iterative rapid gradient sign method to obtain final disturbance; and applying the optimized final disturbance to the reference request to generate an adaptive disguised resistance data request.
7. The data acquisition method based on adaptive masquerading and rights simulation of claim 1, wherein the constructing a rights payload of a simulated regional rights verification logic based on the critical vulnerability region comprises: Based on the type and the history verification data of the key vulnerability region, analyzing a core feature mode relied by the region authority verification logic; Generating a right data segment matched with a sample passing legal verification in format, semantics and logic relation based on the core feature mode; and carrying out standardized and serialized encapsulation on the authority data segment to form a structured authority load.
8. The method for data acquisition based on adaptive masquerading and permission modeling of claim 1, wherein said fusing the permission payload with the antagonistic data request forms a composite request, comprising: Positioning and embedding the authority load to the corresponding position of the antagonistic data request according to the key vulnerability area; And formatting and outputting the embedded request data to generate the composite request.
9. The method for obtaining data based on adaptive masquerading and authority simulation according to claim 1, wherein the reconstructing and purifying the original data by context learning based on a clean-noisy example data pair obtained from a target data source to obtain structured target data comprises: constructing an example data pair consisting of a corresponding clean data sample and a noisy data sample based on the historical interaction data; Inputting the example data pair and the original data to be purified into a pre-trained context learning model as a context, reconstructing and purifying the original data based on the learned mapping relation of the example data pair through the context learning model, and generating purified data; and carrying out structured extraction and integration on the purified data to obtain the structured target data.
10. A data acquisition system based on adaptive disguising and rights simulation, the system comprising: The vulnerability analysis module is used for performing resistance exploration on the target data source, analyzing the dependency of the authority verification logic on the request element and positioning a key vulnerability area; the optimization target construction module is used for constructing a joint optimization target by taking the target data source as a task guide and synchronously optimizing the consistency of the task success degree and the request behavior and the normal mode; the masquerading request generation module is used for optimizing the reference request through an iterative countermeasure disturbance algorithm based on the joint optimization target to generate a countermeasure data request of self-adaptive masquerading; the load simulation module is used for constructing a permission load of a simulation area permission verification logic based on the key vulnerability area, and fusing the permission load with the antagonistic data request to form a composite request; And the data purification module is used for executing the compound request to acquire original data, and reconstructing and purifying the original data through context learning based on clean-noisy example data pairs acquired from a target data source to obtain structured target data.

Description

Data acquisition method and system based on self-adaptive camouflage and authority simulation Technical Field The invention relates to the technical field of data acquisition, in particular to a data acquisition method and system based on self-adaptive camouflage and authority simulation. Background In the technical field of data acquisition, the existing data request generation mode generally lacks a targeted vulnerability exploration mechanism, is difficult to accurately adapt to authority verification logics of different target data sources, the traditional method adopts a fixed-format request template, dependency differences of authority verification on elements of requests are not considered, so that the requests are easy to intercept or identify, the data acquisition success rate is low, meanwhile, the existing disguised request generation technology lacks a dynamic optimization mechanism, the consistency of request behaviors and normal modes cannot be synchronously maintained on the premise of ensuring the task success rate, the safety protection strategy of the data sources is easy to trigger, and the efficiency and the stability of data acquisition are further limited. In addition, the existing data acquisition technology has obvious disconnection in the links of authority adaptation and data processing, on one hand, the accurate authority simulation capability aiming at a key vulnerability area is lacked, the authority load conforming to verification logic is difficult to construct, so that the request cannot pass through core authority verification, on the other hand, the acquired original data often contains noise or abnormal format, the traditional data purification method relies on fixed rules, the self-adaptive reconstruction capability based on context learning is lacked, the complex data pollution problem caused by the resistance request cannot be effectively processed, the integrity and the accuracy of the structured target data are finally insufficient, the actual requirement of downstream business is difficult to meet, and therefore, the problem to be solved urgently is how to improve the efficiency and the stability of data acquisition. Disclosure of Invention The invention provides a data acquisition method and system based on self-adaptive camouflage and authority simulation, which are used for solving the problems in the background technology. In order to achieve the above object, the present invention provides a data acquisition method based on adaptive camouflage and authority simulation, including: s1, performing resistance exploration on a target data source, and analyzing the dependency of authority check logic on a request element to locate a key vulnerability area; S2, constructing a joint optimization target by taking the target data source as task guidance and synchronously optimizing the consistency of the task success degree and the request behavior and the normal mode; S3, optimizing the reference request through an iterative countermeasure disturbance algorithm based on the combined optimization target, and generating a countermeasure data request of self-adaptive disguise; S4, constructing a permission load of a simulation area permission verification logic based on the key vulnerability area, and fusing the permission load with the antagonistic data request to form a compound request; S5, executing the compound request to acquire original data, and reconstructing and purifying the original data through context learning based on the clean-noisy example data pair acquired from the target data source to acquire structured target data. In a preferred embodiment, the performing the resistance probing on the target data source, analyzing the dependency of the rights verification logic on the request element to locate the critical vulnerability region, includes: Constructing a benchmark request with a compliance format and semantics based on the normal interaction mode of the target data source; Defining a disturbance area of a request element to be probed in the reference request based on a request forming part possibly focused by rights verification logic; respectively applying simulation disturbance against noise to each request element disturbance area to generate a post-disturbance request; based on the joint evaluation index and the post-disturbance request, the dependence degree of the authority verification logic on each request element disturbance area is evaluated respectively to obtain a dependence degree evaluation value corresponding to the disturbance area, wherein the core calculation formula of the joint evaluation index is as follows: ; Where F is the dependency evaluation value for a disturbance area, L p (delta) is the disturbance loss of the authority verification result after the disturbance delta is applied, L c (delta) is the difference loss of the disturbance request and the reference request in logic consistency, Loss of weight coefficients for distur