CN-121619066-B - Self-adaptive semantic communication method based on reinforcement learning
Abstract
The invention discloses a self-adaptive semantic communication method based on reinforcement learning, which comprises the following steps of S1, S2, pre-training a plurality of groups of semantic encoder-decoder models, constructing a lookup table, recording different working points and corresponding source rates and distortion indexes, S3, training a strategy network according to a Markov decision process and a reward function, training a domain randomization mechanism, namely randomly selecting or combining different fading models and statistical parameters for each round, generating a block fading sequence, and iteratively updating strategy parameters on multi-domain channel distribution, and S4, carrying out transmission of self-adaptive semantic samples based on the strategy network obtained by training. The invention can realize the semantic feature transmission selection of different fading scenes, and carry out self-adaptive joint decision on physical layer parameters such as coding rate, modulation order, transmitting power and the like, thereby obtaining better task performance and transmission efficiency.
Inventors
- HUANG CHUAN
- WANG JIACHEN
Assignees
- 电子科技大学(深圳)高等研究院
Dates
- Publication Date
- 20260508
- Application Date
- 20260130
Claims (4)
- 1. The self-adaptive semantic communication method based on reinforcement learning is characterized by comprising the following steps of: Step S1, constructing an adaptive semantic transmission architecture based on reinforcement learning: the self-adaptive semantic transmission architecture comprises a sending end, a receiving end and a control module, wherein the sending end comprises a semantic encoder, a quantization module, a channel coding module and a modulation module, and the receiving end comprises a demodulation module, a channel decoding module, a dequantization module and a semantic decoder; the control module comprises a strategy network based on reinforcement learning and is used for controlling module parameters of a sending end; s2, pre-training a plurality of groups of semantic encoder-decoder models, constructing a lookup table, and recording different working points and corresponding source rate and distortion indexes; the step S2 includes: S201, constructing a semantic encoder-semantic decoder model, wherein the semantic encoder-semantic decoder model comprises a semantic encoder and a semantic decoder, and the semantic encoder and the semantic decoder are realized by adopting a neural network; S202, the source rate is set to be the semantic encoder Encoding semantic data of (2), decoding the encoding result by a semantic decoder, and calculating loss between the original semantic data and the decoded data, and recording as source distortion The calculation mode comprises mean square error or cross entropy loss; By parameters of And (3) with Calculation of Training the semantic encoder-semantic decoder model as a loss function of the semantic encoder-semantic decoder model; S203, at different source rates Step S202 is repeatedly executed to obtain a plurality of groups of training results of the semantic encoder-semantic decoder model, and then the source rate is obtained Distortion with source Is noted as: ; the Lagrangian objective function is that, in the step S202 of repeatedly executing to obtain multiple groups of training results of semantic encoder-semantic decoder models, the training results are taken Training results at minimum time As a working point model pair, wherein, 、 Representation of Minimum time semantic encoder parameters and semantic decoder parameters; s204, changing parameters Obtaining a plurality of working point model pairs V represents the number of the obtained working point model pairs, V represents the V-th working point model pair, v=1, 2. Step S3, training a strategy network according to a Markov decision process and a reward function, wherein a domain randomization mechanism is adopted for training, namely, randomly selecting or combining different fading models and statistical parameters for each round, generating a block fading sequence, and iteratively updating strategy parameters on multi-domain channel distribution; And S4, based on the strategy network obtained by training, carrying out transmission of the self-adaptive semantic sample.
- 2. The adaptive semantic communication method based on reinforcement learning according to claim 1, wherein the step S3 comprises: s301, setting a sample set to comprise a plurality of samples, wherein each sample is semantic data to be transmitted; s302, a transmitting end selects one sample from a sample set and gives a source rate; Semantic coding the selected samples by using a semantic coder in the working point model pair to obtain potential semantic representation Will be Partitioning to obtain K semantic feature blocks The semantic feature block is a matrix of W rows and H columns, and W and H are respectively the height and width of the semantic feature block; S303, at the beginning of each sample transmission, firstly calculating the importance weight of each extracted semantic feature block So as to reflect the task contribution difference and the dependency relationship between the features at the same time: calculating task relevance factors The selected sample is transmitted forward through a semantic encoder and a semantic decoder of the selected working point, namely after being processed according to the step S302, the sample is decoded through the decoder, and a decoding result is obtained; Let the selected sample meter be The decoding result is ; Computing reconstruction loss versus signature And global average pooling in the spatial dimension to obtain a factor: Wherein Is the first The value of row w and column h; calculating inter-feature correlation factors Calculation of With any one of the other features Cosine similarity of (2) Averaging the absolute values to obtain The method is used for representing the characteristic related redundancy strength; importance weights are calculated-the importance weights are defined as the product of the two, And normalized to obtain a weight vector ; S304, feature to be transmitted The transmitting end first allocates the quantization bit number The quantization module converts it into a bit stream Then sequentially selecting channel coding rate Modulation order Transmission power Coding is carried out in a channel coding module, and after modulation is carried out in a modulation module, transmission is carried out according to the transmitting power; For the feature to be transmitted The minimum required number of modulation symbols is The transmission time is , wherein, Is the symbol rate; setting a transmission time budget for a sample With total energy budget So that the sample transmission process satisfies And the energy consumption satisfies ; S305, regarding the transmission of each sample as one round, modeling a gradual transmission process of corresponding semantic features as a Markov decision process, and realizing online self-adaptive joint decision through reinforcement learning.
- 3. The adaptive semantic communication method according to claim 2, wherein the step S305 comprises: 1) In the transmission process, the transmitting end performs each decision step Observing the state: Wherein the method comprises the steps of Gain for the current block fading channel; a semantic importance vector for the current sample; The vector is indicated for the feature and, , Indicating whether the feature is transmitted, taking 0 to indicate that it is transmitted, and taking 1 to indicate that it is not transmitted; and (3) with The remaining symbol budget and the power budget, respectively; 2) Action in the ith decision step, the policy network is based on Output action: I.e. first selecting the feature index to be transmitted by the ith decision step Determining physical layer parameters corresponding to the feature, including quantization bits of the ith decision step Coding rate Modulation order And transmit power ; 3) State transition-after performing an action, the symbol and power budget are updated with certainty of consumption of the selected action, the transmitted features are removed from the available set, the channel gain is modeled as a block-fading first-order Markov model Evolution enters the next decision step; Wherein, the Representing the channel gain transition probability of the i+1th decision step relative to the previous decision step, The channel gain transition probability from the ith decision step to the (i+1) th decision step is represented, namely, the channel gain transition probability of the (i+1) th decision step relative to the previous decision step is only related to the (i) th decision step; 4) Bonus function based on the characteristics of the actual recovery of the receiving end Calculating and forming rewards together with time delay: Wherein the method comprises the steps of In order to trade-off the coefficients, Transmission delay of the ith decision step; The method comprises the steps of obtaining a long-term average return, namely maximizing the long-term average return through a training strategy network, and minimizing the weighted sum of the long-term average semantic distortion and the time delay, wherein the maximized long-term average return is obtained by superposing all reward functions of each feature block for each sample, dividing the result by the feature block data of all samples and taking an average value; 5) Domain randomization, in which, at the beginning of each training round, a channel model and its statistical parameters are randomly sampled to generate a block fading process for the round, thereby obtaining a corresponding current block fading channel gain ; The random sampling includes at least any one or a combination of the following: (1) A fading distribution type; (2) Multipath intensity or K-factor, nakagami-m parameter; (3) Average SNR/path loss, shadowing fading variance; (4) Time correlation coefficient or Doppler parameter; (5) A channel state transition probability matrix; by covering multiple channel domains in training, the policy network is made to learn a robust decision map for channel uncertainty.
- 4. The adaptive semantic communication method according to claim 3, wherein the step S4 comprises: s401, receiving an input sample by the system, and selecting a target working point model pair from the LUT ; S402, extracting potential semantic representation by semantic encoder And obtain K semantic feature blocks ; S403, calculating semantic importance weight of each feature to obtain an importance weight vector ; S404, initializing time delay, symbol budget and energy budget of a sample, and observing the current block fading channel state; S405 at each decision step Policy network depends on state Which feature to send is chosen in advance, and the physical layer parameters are determined after decision: and performing corresponding quantization, coding, modulation, and transmission processes; s406, the receiving end executes demodulation, decoding and dequantization to obtain The semantic decoder outputs a task result according to the corresponding rewards, and the system updates the residual budget and enters the next decision step; s407, ending the transmission process of the sample until all the characteristic transmission is completed or budget is exhausted.
Description
Self-adaptive semantic communication method based on reinforcement learning Technical Field The invention relates to the field of semantic communication, in particular to a self-adaptive semantic communication method based on reinforcement learning. Background The existing digital semantic communication scheme depends on fixed or empirical sending and resource allocation strategies, and is difficult to realize end-to-end self-adaptive optimal transmission under the trade-off condition of a time-varying fading channel and multiple targets. Meanwhile, the semantic information is usually expressed by a plurality of features, the contribution of different features to the final task is obviously different, the features are often redundant, and if a mechanism for simultaneously describing the importance and the relativity of the task at the feature level is lacking, the resource allocation is unbalanced, and the overall efficiency and the performance are limited. In addition, the joint decision of physical layer parameters such as feature selection, quantization bit allocation, coding rate, modulation order, transmission power and the like belongs to a high-dimensional hybrid optimization problem, and the traditional analytic or heuristic method is difficult to solve. Therefore, there is a need for a scheduling and physical layer joint configuration scheme that is semantic feature oriented and dynamically adaptive with channel and constraint to meet the communication requirements and improve task performance. Disclosure of Invention The invention aims to overcome the defects of the prior art, provides a self-adaptive semantic communication method based on reinforcement learning, realizes semantic feature transmission selection across different fading scenes, and carries out self-adaptive joint decision on physical layer parameters such as coding rate, modulation order, transmitting power and the like to obtain better task performance and transmission efficiency. The invention aims at realizing the self-adaptive semantic communication method based on reinforcement learning by adopting the following technical scheme that the self-adaptive semantic communication method based on reinforcement learning comprises the following steps: Step S1, constructing an adaptive semantic transmission architecture based on reinforcement learning: the self-adaptive semantic transmission architecture comprises a sending end, a receiving end and a control module, wherein the sending end comprises a semantic encoder, a quantization module, a channel coding module and a modulation module, and the receiving end comprises a demodulation module, a channel decoding module, a dequantization module and a semantic decoder; the control module comprises a strategy network based on reinforcement learning and is used for controlling module parameters of a sending end; s2, pre-training a plurality of groups of semantic encoder-decoder models, constructing a lookup table, and recording different working points and corresponding source rate and distortion indexes; Step S3, training a strategy network according to a Markov decision process and a reward function, wherein a domain randomization mechanism is adopted for training, namely, randomly selecting or combining different fading models and statistical parameters for each round, generating a block fading sequence, and iteratively updating strategy parameters on multi-domain channel distribution; And S4, based on the strategy network obtained by training, carrying out transmission of the self-adaptive semantic sample. The invention has the beneficial effects that by introducing the semantic importance measurement which gives consideration to the task contribution difference and the correlation between the characteristics, under the constraint of time delay and energy budget, the semantic characteristic transmission selection across different fading scenes is realized, and the self-adaptive joint decision is carried out on the physical layer parameters such as the coding rate, the modulation order, the transmitting power and the like, so that the better task performance and the transmission efficiency are obtained. Drawings FIG. 1 is a schematic diagram of the present invention; FIG. 2 is a schematic diagram of an adaptive semantic transport architecture based on reinforcement learning. Detailed Description The technical solution of the present invention will be described in further detail with reference to the accompanying drawings, but the scope of the present invention is not limited to the following description. As shown in fig. 1, an adaptive semantic communication method based on reinforcement learning includes the following steps: Step S1, constructing an adaptive semantic transmission architecture based on reinforcement learning, as shown in FIG. 2: the self-adaptive semantic transmission architecture comprises a sending end, a receiving end and a control module, wherein the sending end compr