CN-122019982-A - Multi-element time sequence data enhancement method based on contrast learning
Abstract
The invention discloses a multi-element time sequence data enhancement method based on contrast learning, which comprises the following steps of collecting original time sequence, service topology and covariates, carrying out peak-valley and accumulation and mutation detection, constructing map data, carrying out compliance verification on enhancement operators, generating parameters by using Mamba three-domain converters under a map mask, carrying out projection and refusal sampling to obtain an enhancement view, carrying out multi-granularity InfoNCE on the view, overlapping the view to be consistent, predicting consistent and circularly consistent loss, carrying out multi-step split prediction and uncertainty estimation by using an improved Chronos model, selecting enhancement types and intensities according to upper confidence limits, generating enhancement proof sheets, carrying out distillation deployment, carrying out on-line monitoring drift and threshold triggering degradation, rollback and retraining. The method promotes representation generalization and robustness and ensures compliance auditability, and is suitable for scenes such as industrial Internet of things, financial wind control and the like.
Inventors
- LUO GUIFU
- YU XU
- LI JIAN
- LU HAO
- SONG WEIYE
- YANG FANG
Assignees
- 青岛华正信息技术股份有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260205
Claims (7)
- 1. The method for enhancing the multi-element time sequence data based on the contrast learning is characterized by comprising the following steps of: Acquiring original time sequence, service topology and covariates, completing event segmentation through peak-valley detection, accumulation and mutation detection, and constructing map data by combining micro-causal discovery, time-lag Granger and short-time Fourier transformation; Executing causal graph Markov equivalent maintenance, physical invariance constraint satisfaction and compliance verification with unchanged main frequency and harmonic ratio on a preset enhancement operator according to the map data, and outputting compliance space data; Invoking a Mamba-based three-domain collaborative transformer in an operator and intensity range defined by the compliance space data, jointly generating enhancement parameters of a time domain, a frequency domain and an event domain under the gating of a graph mask, and executing compliance projection and refusal sampling to produce enhancement view data; Performing multi-granularity InfoNCE contrast learning on the enhancement view data, wherein the superposition graph is consistent, the look-ahead prediction is consistent and the circulation is consistent, and the joint loss is caused, so that characterization data is obtained; Taking the enhanced view data as an evaluation object, calling an improved Chronos model to perform multi-step quantile prediction and uncertainty estimation, and determining the enhanced type and strength by combining an upper confidence boundary strategy to generate enhanced strategy data; And generating an enhancement proof sheet based on the characterization data and the enhancement strategy data, completing knowledge distillation deployment, and triggering degradation, rollback and retraining by on-line monitoring drift and compliance threshold values.
- 2. The method for enhancing the multivariate time sequence data based on contrast learning according to claim 1, wherein the steps of acquiring the original time sequence, the service topology and the covariates, completing event segmentation through peak-valley detection, accumulation and mutation detection, and constructing map data by combining micro-causal discovery, time-lag glaring and short-time Fourier transformation comprise the following steps: The original time sequence is a multivariable time sequence, a unified time axis is established according to the minimum time granularity, gaps are supplemented by interpolation of the original time sequences with different sampling rates according to the linear proportion of adjacent sampling points, and a missing mark is written in the supplemented position; the service topology is a data structure of an object and a connection relation corresponding to the original time sequence, and comprises an entity identifier, an entity type, a connection edge between entities and an edge attribute, wherein the entity identifier is used for carrying out mapping check on a channel and a sensing source in the original time sequence to form an entity-channel corresponding relation table; The covariates are exogenous variables which are derived from an external system or a public data source and influence the original time sequence, the covariates are synchronous with the original time sequence on a time axis, and gaps are linearly interpolated and supplemented according to adjacent sampling points and marked with missing marks; Performing peak-valley detection and accumulation and mutation detection combined event segmentation on an original time sequence, searching local maximum values and minimum values in a sliding window by the peak-valley detection, setting a trigger threshold value according to a covariate segmentation calculation history noise level, gradually accumulating the difference value between the accumulation and mutation detection and a base line, comparing the accumulated difference value with the trigger threshold value, marking the difference value exceeding the trigger threshold value as a mutation point, and starting and stopping an event by pairing continuous mutation points with adjacent peak-valley to generate an event segmentation result; Based on event segmentation results, executing micro-causal discovery to obtain initial results of directional dependence among variables, checking the initial results by using time-lag Grignard test, supplementing main hysteresis orders, adding covariates as exogenous nodes, and outputting a causal graph containing directional edge and hysteresis information; Constructing a topological graph containing node, edge and edge attributes based on the entity-channel corresponding relation table; segmenting an original time sequence by taking an event segmentation result as a boundary, windowing, and then executing short-time Fourier transform on each segment, and polymerizing to form a spectrogram; And combining the causal graph, the topological graph and the spectrogram together with the event segmentation result into spectrogram data.
- 3. The method for enhancing multi-element time sequence data based on contrast learning according to claim 1, wherein the executing causal graph markov equivalent maintenance, physical invariance constraint satisfaction and principal frequency and harmonic ratio invariant compliance verification on a preset enhancement operator according to map data, and outputting compliance space data specifically comprises: determining a preset enhancement operator based on the map data, and setting a candidate parameter range and stepping granularity for each type of enhancement operator; Performing a verification of causal graph Markov equivalent maintenance on a preset enhancement operator, limiting a path which enhances only a channel set which acts on the causal graph adjacent or in the same Markov blanket and can change the conditional independent relationship according to a directed edge relationship and a conditional independent relationship of the causal graph, and judging as non-compliance by unsatisfied persons by adopting a consistent rule between a father channel and a son channel when cross-channel mixing only allows the cross-channel mixing to occur between channel pairs which are directly connected with the causal graph; checking the satisfaction of physical invariance constraint of a preset enhancement operator, checking whether the corresponding entity relation of each channel is maintained or not after enhancement based on entity connection and static constraint in a topological graph, checking whether the dimension and conservation relation given by a business invariance list are destroyed, cutting off parameters touching constraint boundaries according to boundary values, recording cutting-off reasons, and eliminating candidates violating the constraint; The method comprises the steps of executing constant verification of a main frequency and harmonic ratio on a preset enhancement operator, limiting frequency domain energy adjustment to be only carried out in an allowable frequency range based on the main frequency position, main peak bandwidth and each order harmonic ratio of a spectrogram on an event fragment level, prohibiting the introduction of a new main peak and the movement of a recorded main peak to exceed a tolerance range, and rejecting candidates exceeding the allowable frequency range by a frequency range set marked by the spectrogram in a band-pass resampling allowable frequency range; and finally, combining and outputting the filtered enhanced operator list, the upper limit and the lower limit of the corresponding parameters and the allowable list into compliance space data.
- 4. The method for enhancing multi-component time sequence data based on contrast learning according to claim 1, wherein the steps of invoking Mamba-based three-domain collaborative transformer in the operator and intensity range defined by the compliance space data, jointly generating enhancement parameters of time domain, frequency domain and event domain under the gating of a graph mask, performing compliance projection and refusal sampling, and generating enhancement view data comprise: Establishing a Mamba-based three-domain collaborative transformer, wherein the three-domain collaborative transformer is a parameterized data transformation module realized by adopting a selective state space network Mamba; Generating candidate parameter sequences for time warping, amplitude affine, phase perturbation, band-pass resampling, controlled cross-channel mixing and missing playback within the upper and lower parameter limits and the stepping granularity given by the compliance space data, and generating graph mask gating, wherein the graph mask gating is used for marking a channel pair with permission or prohibition marks according to causal graph adjacent relation; applying time warping and then amplitude affine in a fixed sequence, executing compliance projection on out-of-limit values, wherein the compliance projection is to replace values exceeding an upper limit or a lower limit with corresponding boundary values and record replacement positions, sequentially applying phase perturbation and band-pass resampling in an allowed frequency band, directly discarding candidates not in the allowed frequency band, gating and shielding inhibition marks according to a graph mask for operations related to cross-channel influence, applying controlled cross-channel mixing in the allowed cross-channel mapping and event fragment range, and not executing causal graph non-adjacent channel pairs; And carrying out refusal sampling, namely carrying out consistency verification on the result after enhancement, refusing and removing sample marks of the allowed frequency band when event boundaries are out of range, a cross-channel relation conflicts with a graph mask or a main peak is removed, synthesizing an enhancement result according to the sequence of time domain, frequency domain and event domain, generating enhancement view data, and writing operator identification, final parameter values, action channels, event fragments and frequency band numbers.
- 5. The method for enhancing multi-element time sequence data based on contrast learning according to claim 1, wherein the multi-granularity InfoNCE contrast learning is performed on enhancement view data, and the joint loss of superposition graph consistency, look-ahead prediction consistency and cycle consistency is obtained, so as to obtain characterization data, and the method specifically comprises the following steps: loading enhanced view data and corresponding original time sequences, forming training batches according to sample identifiers and event fragment numbers, inputting each sequence into a shared time sequence encoder to obtain intermediate representation, and mapping the intermediate representation into normalized vectors through a projection head; Taking original time sequence and enhanced view data of the same sample and the same time span as positive sample pairs, taking different samples or different event fragments in the same batch as negative sample pairs, implementing multi-granularity InfoNCE contrast learning aiming at the enhanced view data, wherein the multi-granularity comprises sample stages matched with the whole sequence, event stages matched according to event fragment numbers and fragment stages matched according to sliding windows, respectively calculating cosine similarity for each granularity and performing temperature scaling, and accumulating contrast losses of the sample stages, the event stages and the fragment stages according to preset weights; Overlapping the consistent joint loss of the graphs while comparing and learning, constructing an induction relation result according to the causal graph and the channel correlation statistics output by the encoder, comparing the induction relation result with the adjacent relation of the causal graph item by item, and calculating a penalty term from the deviation part; superposing a look-ahead prediction consistent joint loss, setting a look-ahead head, wherein the look-ahead head is a light-weight prediction module, respectively outputting statistics and interval values of the next time period for the original time sequence and the enhanced view data, comparing differences under the same statistical caliber, and taking the differences as punishment items; And overlapping the cyclic consistent joint loss, setting an inverse mapping head which is a light-weight reduction module, taking the enhanced view data as an original form sequence corresponding to input and output, taking the difference between the original form sequence and the original time sequence at a corresponding time point as a punishment item, accumulating the contrast loss and the joint loss into total loss in one forward calculation, updating parameters of a time sequence encoder, a projection head, a foresight head and the inverse mapping head in reverse updating until convergence, and outputting characterization data after training is completed.
- 6. The method for enhancing the multi-component time sequence data based on contrast learning according to claim 1, wherein the method for enhancing the view data is characterized by taking the enhanced view data as an evaluation object, calling an improved Chronos model to conduct multi-step quantitive prediction and uncertainty estimation, determining enhancement type and intensity by combining an upper confidence boundary strategy, and generating enhancement strategy data, and specifically comprises the following steps: The improved Chronos model comprises a discretized encoder, a language modeling decoder and an inverse quantization reducer; Generating to-be-evaluated combinations according to enhancement types, intensities and allowed event fragments and frequency bands by taking compliance space data as candidate pools, segmenting an original time sequence fragment and an enhancement view fragment from a history window corresponding to each to-be-evaluated combination, inputting Ji Xie variables, performing event self-adaptive discretization by a discretization encoder, carrying out segmentation statistics on each channel distribution according to event segmentation results, establishing an adaptive sub-bucket in the segments, mapping continuous values into symbols, generating condition prompt vectors for covariates, injecting the condition prompt vectors in alignment with symbol time axes, writing boundary symbols for values touching compliance boundaries, writing missing symbols for missing measurement positions, recording continuous missing lengths, and outputting the original time sequence symbols, the enhancement view symbols and covariate prompts; The method comprises the steps of splicing an original time sequence symbol, an enhanced view symbol and a covariant prompt according to time, sending the spliced time sequence symbol, the enhanced view symbol and the covariant prompt into a language modeling decoder, predicting a future symbol sequence by autoregressive, supervising the next symbol by cross entropy loss during training, simultaneously placing unreinforced comparison samples and enhanced samples in the same batch, respectively calculating the same weight updating parameters after loss, performing temperature sampling and nuclear sampling in the upper and lower parameter limits of compliance space data in a decoding stage, generating a plurality of future symbol tracks, cutting the non-compliance future symbol tracks according to allowed event fragments and frequency bands, and outputting symbol track data; Inputting symbol track data into an inverse quantization restorer, gradually restoring the symbol track data into real-value tracks according to inverse rules of discretization mapping, sequencing a plurality of real-value tracks at each time step to obtain a quantile value of a preset quantile set, calculating quantile bandwidth and track variance to serve as uncertainty indexes, aligning quantile prediction with real observation of a corresponding time window, gradually accumulating quantile loss according to the quantile points to serve as prediction cost, taking the reduction of the quantile loss relative to an unreinforced control sample as prediction improvement, and recording the reduction and the uncertainty indexes together as evaluation results; And (3) deciding by adopting an upper confidence boundary strategy on the evaluation result, taking the predicted improvement quantity as a benefit item, superposing the quantile bandwidth and the track variance as risk penalties, obtaining a single score, selecting the enhancement type and intensity with the highest score in the range limited by the compliance space data, and sorting the enhancement type and intensity into enhancement strategy data.
- 7. The method for enhancing the multivariate time sequence data based on the contrast learning according to claim 1, wherein the method for generating the enhancement manifest based on the characterization data and the enhancement policy data and completing the knowledge distillation deployment comprises the steps of on-line monitoring drift and compliance threshold triggering degradation, rollback and retraining: Constructing enhanced evidence data based on the characterization data and the enhanced strategy data; The enhancement scheme of the enhancement proving list data in the deployment process is archived, real-time data are continuously collected and compared with the characterization content in the enhancement proving list in the operation stage of the deployment system, and if the deviation between the characterization result of the real-time data and the corresponding characterization content is detected, and the deviation exceeds a set drift threshold value, the current enhancement strategy is regarded as invalid; When the drift threshold is triggered, a compliance response mechanism is started according to the state of the enhancement strategy, wherein the compliance response mechanism comprises the steps of executing degradation processing, namely suspending the current enhancement strategy and switching to verify a stable standard enhancement template in a multi-round training process, executing rollback processing, namely switching the current data processing flow to a non-enhancement path or a last effective enhancement scheme, executing reconstruction processing, and regenerating new map data, compliance space data and enhancement strategy data by taking the current drift data as input in the reconstruction processing.
Description
Multi-element time sequence data enhancement method based on contrast learning Technical Field The invention relates to the technical field of time sequence data mining and machine learning, in particular to a multi-element time sequence data enhancement method based on contrast learning. Background Along with the rapid increase of the needs of multi-element time sequence self-supervision characterization and robust modeling in the scenes of industrial Internet of things, energy load, financial wind control, AIOps and the like, researches on contrast learning and data enhancement are focused. The existing method mainly uses common operators such as jitter, time warping, interpolation resampling, frequency band disturbance or cross-channel mixing to generate views, trains the views with InfoNCE categories, and uses a conventional prediction model for offline verification, but the following problems generally exist in practical application: The method comprises the steps of multi-source time sequence sampling rate inconsistency, cross-channel time difference uncontrollable, common global interpolation and fixed slice neglecting service topology and covariates, event boundary and semantic mismatch, enhancement operator lack Markov equivalent maintenance of causal graph level, time distortion and amplitude affine easy to change skeleton or V structure, writing in ' false invariance ', frequency domain disturbance is limited by energy or bandwidth control, main frequency and harmonic ratio are difficult to stably maintain, period/mechanism mode is destroyed, physical invariance and service conservation (dimension, consistency and monotone/conservation constraint) lack system verification and parameter projection, compliance risk is high, view generation is mainly single domain or serial, time domain/frequency domain/event domain collaboration and graph mask gating are lacked, parameter intensity dependence experience is difficult to self-adapt, contrast learning is mainly single granularity sample level, event level, fragment level is consistent with graph, look ahead prediction is consistent, circulation consistency and other joint constraints, robustness characterization is insufficient, enhancement preferably relies on manual experience or single-round index, UCB strategy driven by partial prediction and uncertainty assessment is lacked, compliance is high, view generation is difficult to carry out on-line monitoring, and ' single-deployment ' enhancement ' and drift is difficult to monitor. Therefore, how to provide a method for enhancing the multi-component time series data based on contrast learning is a problem to be solved by those skilled in the art. Disclosure of Invention The invention aims to provide a multi-element time sequence data enhancement method based on contrast learning, which is characterized by integrating map construction, causal map verification, physical and frequency spectrum constraint, multi-domain enhancement generation, contrast learning characterization, improved Chronos model time sequence prediction evaluation and strategy feedback mechanism, describing the whole process of generating a high-quality enhancement view under the conditions of enhancement operator selection, intensity control and compliance screening in detail, and having the advantages of controllable enhancement process, closed evaluation feedback loop and strong adaptability to heterogeneous time sequence data. According to the embodiment of the invention, the method for enhancing the multi-element time sequence data based on contrast learning comprises the following steps: Acquiring original time sequence, service topology and covariates, completing event segmentation through peak-valley detection, accumulation and mutation detection, and constructing map data by combining micro-causal discovery, time-lag Granger and short-time Fourier transformation; Executing causal graph Markov equivalent maintenance, physical invariance constraint satisfaction and compliance verification with unchanged main frequency and harmonic ratio on a preset enhancement operator according to the map data, and outputting compliance space data; Invoking a Mamba-based three-domain collaborative transformer in an operator and intensity range defined by the compliance space data, jointly generating enhancement parameters of a time domain, a frequency domain and an event domain under the gating of a graph mask, and executing compliance projection and refusal sampling to produce enhancement view data; Performing multi-granularity InfoNCE contrast learning on the enhancement view data, wherein the superposition graph is consistent, the look-ahead prediction is consistent and the circulation is consistent, and the joint loss is caused, so that characterization data is obtained; Taking the enhanced view data as an evaluation object, calling an improved Chronos model to perform multi-step quantile prediction and uncertainty estimation, and deter