CN-121983070-A - Resource self-adaptive semantic compression method based on lightweight Swin Transformer
Abstract
The invention relates to the technical field of semantic communication and intelligent resource management, and provides a resource self-adaptive semantic compression method based on a lightweight Swin Transformer. The method comprises the steps of constructing a semantic encoder by using a Swin Transformer, introducing a gating network to dynamically adjust the depth of the semantic encoder, customizing differential semantic compression ratios for terminals with different calculation forces, improving the attention mechanism of the Swin Transformer, constructing a two-stage sparse attention mechanism for global compression and Top-k selection to reduce the calculation complexity of semantic encoding, constructing a joint calculation force and bandwidth allocation model to minimize the total energy consumption of the system, constructing a resource optimization problem, converting the optimization problem into a Markov decision process, and solving by adopting a near-end strategy optimization algorithm improved by the sparse attention mechanism to obtain the optimal semantic compression ratio, base station calculation force and bandwidth allocation scheme.
Inventors
- ZHENG FEI
- WU DONGYING
- HUANG YUNHAI
- Huang Pengqin
- Huo Lichu
- Yu Yuanzhe
Assignees
- 桂林电子科技大学
- 南宁桂电电子科技研究院有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260119
Claims (7)
- 1. The resource self-adaptive semantic compression method based on the lightweight Swin Transformer is characterized by comprising the following steps of: S1, constructing a semantic encoder by using a Swin Transformer, wherein a gating network is introduced to dynamically adjust the depth of the semantic encoder, so as to customize differential semantic compression ratios for terminals with different calculation forces; S2, improving the attention mechanism of the Swin Transformer, and constructing a two-stage sparse attention mechanism of global compression and Top-k selection so as to reduce the computation complexity of semantic coding; s3, constructing a joint calculation power and bandwidth allocation model by taking the minimum system total energy consumption as an optimization target, constructing a resource optimization problem by taking the minimum system total energy consumption as the optimization target, and setting constraint conditions; S4, converting the optimization problem into a Markov decision process; And S5, solving by adopting a near-end strategy optimization algorithm improved by a sparse attention mechanism to obtain an optimal semantic compression ratio, base station computing power and bandwidth allocation scheme.
- 2. The method according to claim 1, characterized in that: The gating network is designed in Stage3 of the Swin Transformer, and dynamic adjustment of the semantic compression ratio is realized by adjusting Swin Transformer block numbers participating in calculation, wherein the semantic compression ratio and the depth of an encoder are in nonlinear positive correlation, the calculated amount of the encoder is positively correlated with the semantic compression ratio, and the semantic compression ratio is defined as the ratio of the size of original image data to the size of semantic data.
- 3. The method according to claim 1, characterized in that: the sparse attention mechanism comprises global compressed attention, length of length Is divided into the token sequences of A plurality of blocks, averaging the key and value vector in each block to obtain compressed key value pair, calculating attention score, top-k selecting attention, selecting the previous based on global compressed attention score The key compression blocks restore the original token to participate in the attention calculation, and the calculation complexity of the sparse attention mechanism is that 。
- 4. The method according to claim 1, characterized in that: The method comprises the steps of taking a semantic compression ratio, a base station computing power and a bandwidth allocation scheme as optimization variables, taking the minimization of system total energy consumption as a target, and determining an objective function, wherein the system total energy consumption comprises base station semantic coding energy consumption, downlink transmission energy consumption and terminal semantic decoding energy consumption; The expression of the objective function is: wherein For the total system energy consumption, the total system energy consumption comprises base station semantic coding energy consumption, downlink transmission energy consumption and terminal semantic decoding energy consumption, Semantic coding terminal for base station The energy consumption in the case of data is, For base station to terminal Is used for the transmission energy consumption of the (a), For terminals The energy consumption of the semantic decoding is that, Indicating terminal Is used for the image data size of the (c) image data, Representing the number of floating point operations (and semantic compression ratio) required by the encoder to process a bit Positive correlation), Representing the number of floating point operations that the base station may perform per GPU cycle, Representing base station semantic coding terminal The number of GPU cycles used in the data of (a), Is the effective switched-capacitor coefficient of the capacitor, Is a terminal Is used for the transmission power of the (c), Is a base station and a terminal The gain of the channel between them, Is the power spectral density of the noise and, Indicating terminal The size of the semantic data received is determined, Representing the number of floating point operations (and semantic compression ratio) required by the decoder to process a bit Positive correlation), Indicating terminal The number of floating point operations that may be performed per GPU cycle, Indicating terminal The number of GPU cycles used for semantic decoding, Is the effective switched capacitance coefficient.
- 5. The method of claim 1, wherein the constraint expression is: , Wherein the constraint The sum of the calculation power representing the data consumption of the base station semantic coding terminal does not exceed the total calculation power of the base station Constraint(s) Indicating terminal The computational power consumed by semantic decoding does not exceed that of a terminal Maximum force of (2) Constraint(s) Meaning that the sum of the occupied bandwidths of the semantic data of the terminal does not exceed the total bandwidth of the link Constraint(s) Indicating terminal The total time delay of image semantic encoding and decoding and data transmission of (a) does not exceed the maximum tolerable time delay of a task Constraint(s) Representing semantic compression ratios Range condition of (2), constraint Representing image reconstruction quality Not lower than the lowest threshold of reconstructed image quality 。
- 6. The method of claim 1, wherein the markov decision process comprises: The state space at least comprises the calculation power distribution state of the base station, the condition of occupying the link bandwidth by the terminal and the task characteristics of each terminal, the action space at least comprises the depth of the encoder selected by the terminal, the calculation power of the base station used by the terminal and the bandwidth occupied by the terminal, and the rewarding function is constructed to ensure that the agent obtains forward rewards and penalizes against any constraint when the total energy consumption of the system is reduced.
- 7. The method according to claim 1, characterized in that: The near-end strategy optimization algorithm with improved sparse attention mechanism inputs the state space into a sparse attention module, outputs the terminal state characteristics most relevant to the current decision step through sparse attention calculation, thereby reducing the state space dimension, and inputs the sparse characteristics into an Actor network to obtain the action probability distribution. And merging sparse state features of all terminals through a pooling method to obtain a global feature vector so as to realize efficient and accurate state value estimation of the Critic network.
Description
Resource self-adaptive semantic compression method based on lightweight Swin Transformer Technical Field The invention relates to the technical field of semantic communication and intelligent resource management, in particular to a resource self-adaptive semantic compression method based on a lightweight Swin Transformer. Background With the rapid development of emerging applications such as autopilot and virtual reality, the transmission demand of visual data such as images and videos is increasing, and the transmission of massive visual data under limited bandwidth and energy is a new challenge. In the conventional communication paradigm, the transmission and reception of data is mainly dependent on accurate transmission at the symbol level, ignoring the meaning of the data, resulting in a large amount of redundant data being transmitted in the network, wasting bandwidth and energy. Semantic communication is realized by extracting and transmitting the meaning of data, but not complete original data, so that the transmission data volume can be obviously reduced, the communication efficiency is improved, and the method becomes an emerging technical direction. Early semantic communication systems utilized convolutional neural networks to construct semantic encoders and decoders. However, convolutional neural networks have limited capture capability for long-range dependencies, and it is difficult to obtain global semantic information. To overcome this limitation, the following approach introduces a Transformer architecture with self-attention mechanism into image semantic communication. The problem of system energy consumption is highlighted due to the high complexity of the transducer model. The existing method optimizes from a model lightweight design or a resource allocation mechanism, but still adopts a model with fixed computational complexity and a semantic compression ratio, ignores the difference of terminal computing power in an actual network, and leads to low resource utilization rate. Aiming at the problem of low resource utilization rate in the scene of terminal computing power difference, the existing method is often used for optimizing semantic coding and general computing resource allocation separately, cannot carry out joint consideration from the system level, and is difficult to realize global optimization. Thus, there is a need for a self-method that can jointly optimize semantic compression ratio, computational effort and bandwidth allocation to minimize the overall system energy consumption. Disclosure of Invention The invention aims to provide a resource self-adaptive semantic compression method based on a lightweight Swin Transformer. The method solves the problem of energy consumption optimization under the difference of multi-terminal computing power by dynamically adjusting the semantic compression ratio and the joint computing power and bandwidth allocation. In order to achieve the above purpose, the present invention provides the following technical solutions: S1, constructing a semantic encoder by using a Swin Transformer, wherein a gating network is introduced to dynamically adjust the depth of the semantic encoder, so as to customize differential semantic compression ratios for terminals with different calculation forces, wherein the semantic compression ratios are defined as the ratio of the size of original image data to the size of semantic data; S2, improving the attention mechanism of the Swin Transformer, and constructing a two-stage sparse attention mechanism of global compression and Top-k selection so as to reduce the computation complexity of semantic coding; s3, constructing a joint calculation force and bandwidth allocation model, constructing a resource optimization problem by taking the minimum system total energy consumption as an optimization target, and setting a base station total calculation force constraint, a terminal calculation force constraint, a total bandwidth constraint, a task maximum time delay constraint, a semantic compression ratio constraint and an image reconstruction quality constraint; s4, converting the optimization problem into a Markov decision process, defining a state space to at least comprise the calculation power distribution state of a base station, the condition of occupying the link bandwidth by a terminal and the task characteristics of each terminal, defining an action space to at least comprise the depth of an encoder selected by the terminal, the calculation power of the base station used by the terminal and the bandwidth occupied by the terminal, constructing a reward function to ensure that an intelligent agent obtains forward rewards when the total energy consumption of the system is reduced and penalizes against any constraint; And S5, solving by adopting a near-end strategy optimization algorithm improved by a sparse attention mechanism to obtain an optimal semantic compression ratio, base station computing power and band