CN-122004881-A - Electrocardio compression method and system based on learnable semantic distillation and layered residual quantization

CN122004881ACN 122004881 ACN122004881 ACN 122004881ACN-122004881-A

Abstract

The invention discloses an electrocardiographic compression method and system based on learner-based semantic distillation and layered residual quantization, wherein the method comprises the following steps of setting a compression reconstruction frame based on a self-encoder, collecting original electrocardiographic signals, inputting the compression reconstruction frame, outputting compressed electrocardiographic data by the compression reconstruction frame, and performing three-stage training by the compression reconstruction frame, namely initializing learner-based query vectors, performing knowledge distillation and task supervision, and constructing a semantic distillation optimization objective function; freezing the trained query vector, establishing a mapping relation between discrete quantized detail variables, quantized semantic variables and continuous features, thawing all core module parameters, and carrying out weighted fusion on a plurality of optimization targets of semantic distillation, waveform reconstruction and vector quantization. By adopting the technical scheme, the compression reconstruction framework is utilized to realize deep decoupling and orthogonal representation of 'pathology semantic information' and 'morphology detail information' in the electrocardiosignal, and layered scalable transmission and near diagnosis compression are supported.

Inventors

JIN JIANXIU
Guan Yeyi
XIONG QIWEI
SHU LIN

Assignees

华南理工大学

Dates

Publication Date: 20260512
Application Date: 20260128

Claims (13)

1. An electrocardiographic compression method based on learning semantic distillation and layered residual quantization is characterized by comprising the following steps: Setting a compression reconstruction frame based on a self-encoder, collecting an original electrocardiosignal, inputting the original electrocardiosignal into the compression reconstruction frame, and outputting compressed electrocardiosignal data by the compression reconstruction frame; The compression reconstruction framework performs three-stage training: The method comprises the steps of constructing a supervision framework of a teacher-student model, pre-training external knowledge of a teacher network to guide a student network, specifically initializing a learnable query vector, extracting semantic flow characteristics of an original electrocardiosignal by using a double-flow time sequence encoder of the student network, constructing a joint optimization objective function comprising semantic knowledge distillation and task supervision, updating parameters of the query vector and the double-flow time sequence encoder through back propagation, and enabling the student network to have initial semantic characteristic learning capability; freezing the trained query vector, intensively optimizing a preset hierarchical vector quantizer and a time sequence decoder, inputting continuous semantic hidden variables and continuous detail hidden variables output by a double-flow time sequence encoder, establishing a mapping relation between discrete quantized detail variables and quantized semantic variables and continuous features by minimizing waveform reconstruction loss and codebook promise loss, and realizing the reconstruction of the electrocardiographic waveform time sequence form by utilizing a semantic guidance mechanism; Thawing the double-flow time sequence encoder, layering vector quantizer, time sequence decoder and parameters, weighting and fusing a plurality of optimization targets of semantic distillation, waveform reconstruction and vector quantization, and achieving global optimal balance among compression efficiency, signal reconstruction quality and clinical diagnosis precision by end-to-end cooperative training of a compression reconstruction framework.
2. The method for cardiac compression based on learning semantic distillation and layered residual quantization according to claim 1, wherein the method is characterized in that a learning query vector is initialized, semantic flow characteristics of original electrocardiosignals are extracted by using a double-flow time sequence encoder of a student network, a joint optimization objective function comprising semantic knowledge distillation and task supervision is constructed, parameters of the query vector and the double-flow time sequence encoder are updated through back propagation, and the student network has initial semantic characteristic learning capability, and the specific steps are as follows: initializing a group of learnable query vectors, inputting the learnable query vectors into a double-stream time sequence encoder to extract semantic hidden variables And detail hidden variables The optimization process surrounds semantic hidden variables The method specifically comprises two parallel supervision paths: One is knowledge distillation path, semantic hidden variable Performing dimension projection and space transformation through a feature adapter to align the feature distribution with the high-dimension features extracted by the teacher network, wherein the method specifically comprises the following steps: Fully connecting each dimension in the input vector with all neurons of the output layer of the feature adapter through a weight matrix, projecting data from one feature space to another feature space through linear transformation (weighted summation), and introducing bias terms to adjust the flexibility of fitting: , Wherein, the Is an input vector in the shape of ; Is a weight matrix capable of learning and is shaped as ; Is a bias vector in the shape of ; Is an output vector in the shape of ; Then, together with teacher network characteristics, performing statistics aggregation operation based on global average pooling and maximum pooling, and calculating distillation loss between global statistics feature vectors of the two, wherein the statistics aggregation is as follows: , , Wherein, the For the output of the student network after the adapter processing, For the output of the teacher's network, The difference of the sequence length is eliminated, and the global semantic barycenter and the most obvious pathological response intensity are respectively represented, namely, the average value of the output characteristics of the student network in the time dimension, the average value of the output characteristics of the teacher network in the time dimension, the maximum value of the output characteristics of the student network in the time dimension and the maximum value of the output characteristics of the teacher network in the time dimension; The number of the learnable query vectors is represented, i, j is the serial number of the learnable query vectors, T is the receiving length of the sequence splicing layer; the final distillation loss is formed by weighting the direction alignment loss and the intensity alignment loss, and the loss function The method comprises the following steps: , Wherein, the Balance weight coefficients of the direction loss and the intensity loss are respectively; the first term direction alignment loss is calculated by cosine similarity through constraint of global mean vector And (3) with Forcing students to learn the activation direction of the teacher network in the feature space, namely learning the distribution mode of the main pathological features; the second term intensity alignment penalty uses the L1 distance by constraining the global maximum vector And (3) with Forcing the student network to learn the response intensity of the teacher network to the specific pathological waveforms; Secondly, task supervision path and semantic hidden variable Inputting the result into a classification head to generate a pathology prediction result, and calculating cross entropy loss with a real clinical diagnosis label The method comprises the following steps: , wherein M is the total category number of pathological classification, One-hot encoding for real label, if sample belongs to category c, then Otherwise, 0; The probability value of the c-th class in the predictive probability distribution output by the classification head is used as the probability value; weighted summation of distillation loss and cross entropy loss as overall optimization objective The double-flow time sequence encoder, the query vector and the classification head are subjected to back propagation update, so that the double-flow time sequence encoder has initial semantic learning capability: , Wherein, the Is a weight coefficient for balancing two loss.
3. The method for cardiac compression based on learning semantic distillation and layered residual quantization according to claim 1, wherein the trained semantic stream parameters are frozen, a preset layered vector quantizer and a time sequence decoder are optimized in a concentrated manner, a mapping relation between a discrete codebook and continuous features is established by minimizing waveform reconstruction loss and codebook promise loss, recovery of the time sequence form of an electrocardiographic waveform is realized by utilizing a semantic guidance mechanism, and waveform reconstruction and vector quantization optimization objective functions are obtained, wherein the method comprises the following specific steps: setting a total optimization objective consisting of reconstruction loss, codebook loss and commitment loss, wherein the reconstruction loss is used for weighing the reconstructed signal And the original signal The fidelity between them, the codebook loss term is used to update the vector e in the discrete codebook to make it move to the output of the double-flow time sequence encoder (i.e. cluster center update), the promise loss is used to restrict the double-flow time sequence encoder to make it output hidden variable Not wantonly drifting, but approaching to the currently selected codebook vector e as much as possible; At this stage, the query in the dual stream sequential encoder, which has been trained in the first stage, is frozen to preserve its diagnostic semantics, while the centralized computation updates the sequential decoder parameters and codebook vectors, the loss function is: , , , , Wherein, the The signal is reconstructed and the signal is then processed, As a result of the original input signal, As an implicit variable of the output of the encoder, For codewords in the codebook, sg is stopping gradient operation, Are weight coefficients for the reconstruction loss and codebook loss.
4. The method of cardiac compression based on learnable semantic distillation and layered residual quantization according to claim 3, wherein the codebook of the layered vector quantizer consists of a semantic codebook and a detail codebook, and is respectively aimed at semantic hidden variables output by a dual-stream sequential encoder Hidden variables with details Performing a differentiated discrete processing strategy; The semantic codebook adopts a standard vector quantization mechanism, the detail codebook adopts a multi-level residual vector quantization mechanism and consists of a plurality of cascaded subcodebooks, and a quantization formula is defined as follows: the quantization of the semantic codebook, namely setting continuous semantic hidden variables output by an encoder as The semantic codebook is defined as Where K is the codebook capacity (i.e., the total number of prototype vectors involved), Is the kth prototype vector; The quantization process aims at searching prototype vectors with the nearest Euclidean distance to input vectors in the codebook, and quantized semantic features The method comprises the following steps: , , wherein, index As transmitted semantic discrete codes Then the data is sent to a time sequence decoder as global context guide and can be used for subsequent classification by a classification head; Detail codebook quantization, namely setting continuous detail hidden variables output by an encoder as The detail quantizer comprises M-level concatenated subcodebooks Codebook of each level All contain N prototype vectors, and the quantization process is a process of recursively calculating residuals: defining an initial input residual as For the m-th level quantization ) The following steps are performed: Sub-level mapping codebook at current level Find the residual error with the previous stage Closest vector : , Wherein, the For vector quantization process, meaning that find and look for in codebook m The codeword that is the closest to it, Is a codeword in codebook m; residual updating, calculating a new residual As input to the next stage quantization: , Final reconstruction, namely after M-level quantization, the quantized detail features The sum of the vectors is quantized for all sub-levels: , With the residual quantization mechanism, quantization errors are quantized with increasing quantization level number M Gradually reduces, and realizes high-fidelity reduction of the original waveform characteristics.
5. The method for cardiac compression based on learnable semantic distillation and layered residual quantization according to claim 3, wherein the steps of thawing the parameters of the dual stream temporal encoder, layered vector quantizer and temporal decoder, weighting and fusing a plurality of optimization targets for semantic distillation, waveform reconstruction and vector quantization, and achieving a global optimal balance among compression efficiency, signal reconstruction quality and clinical diagnostic accuracy by end-to-end co-training of the compression reconstruction framework are as follows: defreezing parameters of the double-flow time sequence encoder, the layered vector quantizer and the time sequence decoder, and splitting original electrocardiosignals into two parallel supervision paths after encoding and quantization: in the signal reconstruction path, the time sequence decoder generates a reconstruction signal by utilizing the fused discrete characteristics, calculates reconstruction loss to ensure waveform reduction degree, and specifically comprises the following steps: The time sequence decoder performs reverse mapping by utilizing a corresponding preset codebook according to the received discrete index sequence to respectively recover semantic hidden variables And timing hidden variable ; The time sequence hidden variable is processed Inputting the initial time sequence characteristics into a one-dimensional convolution layer at the head end, and performing projection transformation of channel dimensions to obtain the initial time sequence characteristics; inputting initial timing characteristics to a first query enhanced attention module while introducing the semantic hidden variables As Query vector (Query), performing attention calculation with time sequence feature as Key Value (Key/Value), and outputting feature stream fused with preliminary semantic information; Sequentially passing the fused characteristic stream through two cascaded residual blocks, and extracting deep context information while maintaining effective gradient propagation; sending the processed features to a second query attention-enhancing module, and reusing the semantic hidden variables Performing secondary semantic alignment and enhancement on the feature stream to strengthen pathological category features in the reconstructed signal; After the characteristic flow is subjected to smoothing treatment of a subsequent residual block, the terminal convolution layer is utilized to project the high-dimensional characteristic back to the original signal space, and finally, a reconstructed electrocardiogram is generated; in the semantic diagnosis path, semantic hidden variables Distillation supervision from the teacher network and classification supervision of the real labels continue to be accepted to maintain a high level of pathology recognition capability, specifically consisting of two parallel processing branches: first, distillation branches: mapping through a characteristic adapter, aligning the characteristic dimension to the output dimension of a teacher network, and calculating distillation loss ) To achieve knowledge migration; secondly, classifying branches: Predicting probability distribution of each category through a classification head, and calculating cross entropy loss by combining with a real category label ) To realize classification supervision; Total loss function Is a weighted sum of all loss terms, i.e., contains both reconstruction loss, codebook loss, commitment loss, semantic distillation loss, and class cross entropy loss, as follows: , Wherein, the , , , , Are all weight super parameters used for allocating the proportion of different loss in training.
6. An electrocardiographic compression system based on the method of one of claims 1 to 5, comprising a data acquisition unit and a processing unit, wherein the data acquisition unit is used for acquiring original electrocardiographic signals and transmitting the same to the processing unit; The processing unit is internally provided with a compression reconstruction frame based on a self-encoder, and the compression reconstruction frame comprises a double-flow time sequence encoder, a layered vector quantizer and a time sequence decoder; The double-flow time sequence encoder is used for extracting semantic flow branches of original electrocardiosignals, carrying out knowledge distillation and task supervision, constructing a semantic distillation optimization objective function, and carrying out back propagation update on query vectors and classification heads of the double-flow time sequence encoder so as to enable a student network to have initial semantic feature learning capability; Continuous semantic hidden variables and continuous detail hidden variables output by a double-flow time sequence encoder are input into the layered vector quantizer and the time sequence decoder, a mapping relation between discrete quantized detail variables and quantized semantic variables and continuous features is established by minimizing waveform reconstruction loss and codebook promise loss, and the reconstruction of electrocardiographic waveform time sequence forms is realized by utilizing a semantic guidance mechanism; And thawing all core module parameters, carrying out weighted fusion on a plurality of optimization targets of semantic distillation, waveform reconstruction and vector quantization, and realizing global optimal balance among compression efficiency, signal reconstruction quality and clinical diagnosis precision through end-to-end cooperative training.
7. The cardiac compression system of claim 6, wherein the dual stream temporal encoder consists of two convolutional layers head to tail, four residual feature extraction blocks, and two query-enhanced attention modules (QAA); after an input signal enters a double-flow time sequence encoder, the input signal passes through a first layer of convolution layer, the convolution kernel size is 7, and the output channel number is 32; the method comprises the steps of entering a hierarchical coding flow, namely, firstly, a signal passes through the first two residual characteristic extraction blocks (the stepping length is 4), and the number of channels is doubled layer by layer; The feature vector continuously passes through the last two residual feature extraction blocks (the stepping lengths are 4 and 2 respectively), and the channel number is continuously doubled, so that the down-sampling of the input signal is realized; The deep features are output in two ways, one way is input into the second query attention-enhancing module together with the preliminary semantic query to generate semantic hidden variables The other path outputs detail hidden variables through the last layer of convolution layer with the convolution kernel size of 3 。
8. The cardiac compression system of claim 6, wherein the hierarchical vector quantizer employs a dual stream parallel processing architecture comprising a semantic codebook module, a detail codebook module, and a stitching unit; The first processing branch is a semantic quantization branch, and the semantic codebook module is configured to receive an input semantic hidden variable The module internally comprises a semantic codebook, is used for carrying out nearest neighbor search and matching on semantic features and outputs a first index representing high-level semantic information; The second processing branch is a detail quantizing branch, and the detail codebook module is configured to receive an input detail hidden variable The module internally comprises a group of cascaded or parallel detail codebooks, is configured to carry out fine quantization on the texture features and outputs a second index group representing signal detail information; The splicing unit is respectively connected with the semantic codebook module and the output end of the detail codebook module, and is configured to carry out serial splicing on the first index and the second index group to generate a final index sequence as output so as to realize complete discretization representation of the semantic information and the texture detail of the original signal.
9. The cardiac compression system of claim 6, wherein the timing decoder is comprised of a timing characteristics reconstruction backbone network and an inquiry enhanced attention module (QAA) embedded therein; The time sequence feature reconstruction backbone network consists of a plurality of one-dimensional convolution layers (Conv 1 d) and residual feature reconstruction modules (Res-Block) in cascade, wherein the convolution layers of an input end and an output end are respectively responsible for the inverse projection of feature dimensions and the generation of final waveforms, and a middle area comprises a plurality of groups of serially connected residual modules for recovering the time sequence length and the form details of electrocardiosignals step by step; A query enhancement attention module (QAA) as an interface for semantic guidance information injection, embedded before or between the residual module groups, configured with a dual input interface to receive the detail features from the backbone network and semantic hidden variables from the quantizer, respectively; two parallel and interactive data processing paths are formed in the time sequence decoder, and detail reconstruction and semantic guidance are respectively executed: In the detail feature reconstruction path, the quantized detail hidden variable (Z d' ) firstly carries out preliminary feature mapping through an input end convolution layer, and then enters a first QAA module; The detail features receive the injection and calibration of global semantic information by using an attention mechanism, the first feature enhancement is completed, the enhanced features are subjected to up-sampling or deep feature refinement through a first group of residual modules, and the enhanced features enter a subsequent QAA module again to receive semantic guidance; and finally, the characteristic sequence subjected to multistage reconstruction and correction is mapped back to the original signal space through an output end convolution layer to generate a high-fidelity reconstructed electrocardiosignal.
10. The cardiac compression system of claim 7 or 9, wherein the query enhancement attention module is composed of a query vector storage unit, a sequence stitching layer, a multi-headed self-attention layer, a feedforward neural network layer, and a feature splitting layer; The sequence splicing layer receives electrocardio characteristic input with length of T and compares the electrocardio characteristic input with the preset quantity in the query vector storage unit to obtain the input data Is combined in the time dimension to form a learning query vector of length T + Is a sequence of enhanced features; the enhanced sequence enters a multi-head self-attention layer, a query vector at the tail end of the sequence is used as a probe to actively aggregate global pathological semantic information, and the electrocardio characteristics at the front end of the sequence are also updated by fusing global context information through an attention mechanism; The whole enhancement sequence enters a feedforward neural network layer, high-dimensional characteristics are extracted through linear transformation and an activation function, and nonlinear expression capacity of a model is enhanced; the characteristic splitting layer separates the sequence processed by the feedforward network and restores the sequence into two parts, namely The personal vector is used as semantic hidden variable ) Outputting to a semantic stream branch for subsequent semantic quantization and classification, and embedding and outputting the first T vectors serving as electrocardio features enhanced by an attention mechanism to a detail stream branch or a next-stage convolution layer; The feedforward neural network layer (FFN layer) is a two-layer fully connected layer, and the activation function of the first layer is Relu.
11. The cardiac compression system of claim 9, wherein Res-Block consists of three residual units and one-dimensional convolution layer, wherein the residual units consist of two one-dimensional convolution layers, an activation function and residual connections, wherein the number of channels of each residual unit is consistent with the number of channels of the input feature, the convolution kernel size of the last convolution layer is 3, the step length and the number of channels are consistent with the input feature, and the pre-activation residual structure is used, and the two parallel paths of the main convolution branch and the jump connection branch are formed by: The main convolution branch adopts a bottleneck design and consists of a plurality of groups of Snake activation layers and one-dimensional expansion convolution layers which are connected in series, wherein the number of channels is distributed in a way of compression before recovery, and the convolution layers are configured with non-zero expansion rates to enlarge a receptive field; The jump connection branch is used for transmitting identical mapping characteristics or comprises a 1x1 convolution layer for dimension alignment when the input and output dimensions are inconsistent; The input characteristic tensor enters the module and then is divided into two paths, wherein one path enters a main convolution branch, is compressed by a channel, is subjected to periodic nonlinear transformation and one-dimensional expansion convolution sequentially through a Snake activation function to perform time sequence characteristic extraction, and is finally restored to the original channel dimension; the other path is directly transmitted through the jump connection branch; The output of the main convolution branch is added element by element with the output of the jump connection branch as the final output of the module.
12. The cardiac compression system of claim 7, wherein a feature adapter is configured between the semantic output of the dual stream sequential encoder and the teacher network, the feature adapter being a learnable feature alignment unit with a physical structure that is a fully connected layer; the feature adapter uses the semantic hidden variable generated by the double-flow time sequence encoder The characteristic space is mapped to the teacher network from the original latent space projection, and the characteristic dimension is adjusted to match the output dimension of the teacher network, so that the two can calculate distillation loss in the unified measurement space, and the student network can effectively simulate the high-dimensional characteristic distribution of the teacher network.
13. The cardiac compression system of claim 7, wherein a classification head is configured after the dual stream sequential encoder or semantic quantizer output, the classification head acting as an auxiliary supervision component mapping semantic hidden variables to predefined electrocardiographic pathology label space; The classification head comprises a feature aggregation unit for aggregating a sequence of input semantic query vectors Performing Global Average Pooling (GAP) or attention weighted aggregation in the time dimension, compressing it into a single global semantic feature vector that does not vary with sequence length; The single global semantic feature vector is input into a multi-layer perceptron (MLP), and the multi-layer perceptron (MLP) is composed of a plurality of cascaded full-connection layers, normalization layers and nonlinear activation layers, and the prediction probability distribution corresponding to various arrhythmia categories is output at the tail end through a linear projection layer.

Description

Electrocardio compression method and system based on learnable semantic distillation and layered residual quantization Technical Field The invention belongs to the technical field of electrocardiosignal compression, and relates to an electrocardiosignal compression method and system based on learning semantic distillation and layered residual quantization. Background The existing electrocardiosignal compression technology mainly goes through the evolution from the traditional transform domain method to the modern deep learning method, and mainly comprises a method based on transform domain and compressed sensing, a method based on a deep convolution self-encoder and a method based on a transform and a generative model. The method based on transform domain and compressed sensing mainly utilizes sparsity of electrocardiosignals in specific transform domains (such as frequency domain and wavelet domain). Typical transform coding processes generally comprise three steps of transforming, quantizing and encoding, for example, using Discrete Wavelet Transform (DWT) or Discrete Cosine Transform (DCT) to convert a time domain signal into frequency domain coefficients, thresholding to remove the lower energy coefficients, and finally outputting a compressed code stream in combination with entropy coding techniques. Another related Compressed Sensing (CS) technique uses a random observation matrix to undersample the signal, and reconstructs the signal at the receiving end through a nonlinear optimization algorithm. However, such methods are based mainly on statistical properties of the signals (e.g. minimizing mean square error), lacking in the perceptibility of the pathology semantics. Under high compression ratio, the loss of waveform details of small but clinically significant values such as P wave and ST segment is easy to cause, namely gibbs effect is generated, and the diagnosis key region and background noise in the signal cannot be effectively distinguished. With the development of deep learning, compression based on an end-to-end neural network is becoming the mainstream. Depth convolution-based self-encoder (CAE) methods typically employ an encoder-decoder architecture, where the encoder extracts spatial morphological features of the cardiac signal through a convolutional neural network and maps them into low-dimensional continuous latent variables, which are then inverse mapped back to the original signal by a deconvolution operation. Although the compression rate is improved compared with the traditional method, the existing self-encoder technology still has the obvious defect that the pathological semantics (such as arrhythmia category) and waveform details (such as high-frequency textures) are mixed and encoded in the same 'black box' latent space, so that the semantics and the details are seriously coupled. The coupling makes the model unable to realize 'semantic priority' transmission, and also unable to support layered scalable transmission, i.e. unable to dynamically select to transmit only diagnostic information or reconstruct high-fidelity waveforms according to channel states. In recent years, processing of electrocardiographic signals using self-attention mechanisms and generation of a countermeasure network (GAN) has become a new trend. For example, a robust representation of the signal is learned from the encoder by "padding" in a "shape, using a mask, or a multi-lead signal is generated from the single-lead signal using a GAN. Although transfomers are adept at capturing long-range dependencies, their computational complexity is high and difficult to deploy on low-power-consumption edge devices. More importantly, the current generation type compression method focuses on the fidelity of signals, and a physical corresponding relation between a discrete codebook and explicit clinical diagnosis semantics is not established, so that the compressed features are difficult to directly use for downstream analysis tasks. In summary, although the prior art has made some progress in compression ratio and reconstruction accuracy, it has generally failed to achieve efficient decoupling of "diagnostic semantics" from "waveform details". A new compression architecture is needed in clinic that can directly extract high-level semantics in the compressed domain for diagnosis, while enabling hierarchical high-fidelity reconstruction through residual compensation. Disclosure of Invention The invention aims at solving the problems in the prior art and provides an electrocardiographic compression method and system based on learnable semantic distillation and layered residual quantization. In order to achieve the aim, the basic scheme of the invention is that the electrocardiographic compression method based on the learning semantic distillation and the layered residual quantization comprises the following steps: Setting a compression reconstruction frame based on a self-encoder, collecting an original electrocardiosi