CN-121981190-A - Multi-task electromagnetic model based on hybrid expert network
Abstract
The invention discloses a multi-task electromagnetic model based on a hybrid expert network, belonging to the wireless communication technology. The model comprises a preprocessing module, a feature extraction module, a task output module and a pre-training-fine tuning learning strategy, wherein the preprocessing module performs standardized processing on multi-source electromagnetic signals, the feature extraction module introduces a mixed expert network to replace a part of traditional feedforward neural network based on a transducer structure, an expert sub-network is dynamically selected through a task specific routing mechanism, the task output module configures a special structure according to different task targets, the pre-training stage performs pre-training on large-scale non-tag data by using a mask self-encoder, and downstream tasks perform full-scale fine tuning on small-scale tagged data. The invention realizes multi-task collaborative learning and differential expression, and improves the recognition performance, robustness and processing efficiency of the model in a complex electromagnetic environment.
Inventors
- PENG YAXIN
- Mo Hanqian
- PENG YAN
- WEI HONGYU
- KONG HAO
- ZHOU YANG
- ZHENG JIANYONG
Assignees
- 上海大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260402
Claims (10)
- 1. A multi-task electromagnetic model based on a mixed expert network is characterized by comprising a preprocessing module, a feature extraction module, a task output module and a pre-training and fine-tuning learning strategy module, The preprocessing module is used for uniformly processing the multi-source electromagnetic signals into a standardized input format; the feature extraction module is used for carrying out deep feature modeling and time sequence relation modeling on the sequence signals by adopting a multi-layer transducer network structure; The task output module is composed of a plurality of groups of task special structures and is adaptive to a classifier, a regression layer or a decoder according to the type of the electromagnetic task; The pre-training-fine tuning learning strategy module is used for training large-scale untagged electromagnetic data by using a mask self-encoder in a pre-training stage, and performing full-scale fine tuning by using small-scale tagged data in a downstream task stage.
- 2. The hybrid expert network-based multitasking electromagnetic model of claim 1, wherein the specific processing procedure of the preprocessing module is: and performing amplitude normalization processing on the original I/Q signals, packaging the signals after normalization processing, performing patch segmentation on the packaged sequence, and superposing a learnable position code for each patch to obtain the input characteristics of the characteristic extraction module.
- 3. The hybrid expert network-based multitasking electromagnetic model of claim 1 in which each layer of the multi-layer fransformer network structure comprises a self-attention layer, a feed-forward network, a normalization layer and a residual layer connection; the mixed expert network is introduced into the even layer to replace the traditional feedforward neural network, each mixed expert layer comprises 8 expert sub-networks, the expert selection weight is calculated through the task specific routing network, and two experts with the largest selection weight participate in calculation and weighted summation for each token.
- 4. The hybrid expert network-based multitasking electromagnetic model of claim 3 in which, The method comprises the steps of firstly setting the maximum capacity of a processable token for each expert, when the number of tokens distributed to a certain expert exceeds the upper limit of the capacity, the exceeding tokens no longer enter forward calculation of the expert and are directly transmitted through a residual structure, and when the number of the distributed tokens is insufficient, zero filling alignment is carried out on the input of the expert.
- 5. The hybrid expert network-based multitasking electromagnetic model of claim 3 in which, In a route distribution stage, introducing a token ordering mechanism based on importance, calculating importance scores of each token according to the weight of each token in a route network, and sequentially distributing expert resources for the tokens according to the order of the importance scores from high to low; In the model training stage, in order to restrict expert load, auxiliary loss items are introduced outside the main task loss to participate in joint optimization.
- 6. The hybrid expert network-based multitasking electromagnetic model of claim 4 in which, And constraining the number of the tokens actually received by each expert, introducing a load statistical method based on probability estimation in a routing stage, namely adding random disturbance into a routing score, estimating the probability of the token being routed to each expert, and accumulating the probability in a batch range to serve as an approximate measure of expert load.
- 7. The hybrid expert network-based multitasking electromagnetic model of claim 1, wherein a mask self-encoder is used as a pre-training strategy to pre-train the shared feature extraction module with large-scale unlabeled electromagnetic data, wherein during the pre-training process, input patches are randomly masked, and only a small number of visible patches are used to reconstruct the masked portion, so that the model learns the global structure of electromagnetic signals; and the final loss of the pre-training stage is a weighted sum of the reconstruction loss and the auxiliary loss, and the full-quantity fine-tuning stage of the downstream task is entered after the pre-training is completed.
- 8. The hybrid expert network-based multitasking electromagnetic model of claim 7 in which, The pre-training and fine-tuning learning strategy module adopts a mask self-encoder as a pre-training strategy, uses large-scale unlabeled electromagnetic data to pre-train the shared feature extraction module, randomly shields input patches in the pre-training process, and rebuilds the shielded part by using a small number of visible patches so as to enable the model to learn the global structure of the electromagnetic signal.
- 9. The hybrid expert network-based multitasking electromagnetic model of claim 7 in which, The full-scale fine tuning stage of the downstream task adopts an end-to-end joint training mode, and synchronously updates the routing parameters and the expert parameters, so that the mixed expert network can adaptively adjust expert selection strategies according to different tasks.
- 10. The hybrid expert network-based multitasking electromagnetic model of claim 1 in which the task output module configures different structures depending on task type: For classification tasks such as modulation recognition and the like, a fully-connected classification layer is adopted to output probability distribution of each class; for regression tasks such as parameter estimation, a regression output layer is adopted to generate continuous values; For the signal reconstruction task, a decoder structure is employed to output the sequence result.
Description
Multi-task electromagnetic model based on hybrid expert network Technical Field The invention relates to the technical field of communication and wireless communication, in particular to a multi-task electromagnetic model based on a hybrid expert network. Background Electromagnetic signals (electromagneticsignal, EM siganl) refer to physical quantities that propagate in space or medium in the form of electromagnetic waves and carry information content, which are typically generated by manual modulation, by changing parameters such as amplitude, frequency, phase, etc. of a carrier wave to achieve encoding and transmission of information. Electromagnetic signals are used as basic resources of modern information society and are core carriers for supporting the operation in the fields of wireless communication, radar detection, satellite navigation and the like. Along with the continuous expansion of electromagnetic application, the electromagnetic sensor plays an increasingly important role in emerging scenes such as unmanned systems, intelligent perception, electronic countermeasure and the like. Therefore, the time-frequency structure, the characteristic representation, the propagation and the interference mechanism of the electromagnetic signal are deeply researched, and the method has important significance for improving the perception capability, the anti-interference capability and the autonomous decision capability of the system. Around the processing and understanding of electromagnetic signals, multiple analysis tasks such as modulation recognition, blind source separation, radio frequency fingerprint recognition and the like are derived in the industry, and have key roles in scenes such as signal decoding, interference suppression, spectrum management and the like. The analysis method in the current electromagnetic field mainly focuses on special models optimized for specific tasks, namely each model is designed only for a single task, and the models can obtain excellent performance on the specific tasks. In recent years, with the increasing complexity of electromagnetic environments and the rapid development of intelligent electromagnetic networks, electromagnetic signals exhibit the characteristics of multisource, diversification and dynamics. Under the background, the traditional signal analysis method of single task and single model gradually exposes limitation, a plurality of small models which are independently trained cause model redundancy and calculation power waste, and knowledge among tasks cannot be shared, so that the adaptability and the robustness of the system are insufficient. Thirdly, in the face of application scenes of multi-task concurrency, the problems of increased management complexity, increased response time delay and the like are introduced in parallel operation of a plurality of models, and the unified intelligent electromagnetic analysis framework is not facilitated to be constructed. Based on the above needs, the academia and industry have begun exploring a multi-tasking electromagnetic base model in recent years. The model aims to process a plurality of electromagnetic field tasks under a unified network architecture and realize the collaborative learning of multitasks from signal modulation recognition to radio frequency fingerprint recognition and the like. Compared with the traditional single-task model, the multi-task basic model can capture potential correlation among tasks in multi-task joint optimization, and information fusion and knowledge migration are promoted through shared representation space, so that the robustness and generalization of the model are prompted. In the process, the model not only learns task specific characteristics, but also learns general electromagnetic characteristics with universality, and provides important support for subsequent small sample tasks, distributed external signal identification and cross-domain self-adaption. The current research obtains good performance under the condition of downstream full-scale fine adjustment by a large-scale untagged data pre-training and small-scale tagged data fine adjustment method, and proves the feasibility of a pre-training and fine adjustment paradigm in the field of electromagnetic signal processing. However, the research in the field is still in an early exploration stage, and has many challenges that the acquisition and labeling processes of electromagnetic signals are high in cost and depend on personnel with professional backgrounds, different types of electromagnetic signals have obvious differences in frequency, modulation modes and physical characteristics, so that the model must deal with heterogeneous and dynamic change data in the learning process, the knowledge sharing mechanism among different tasks is not clear, and the problem that performance balance and information conflict easily occur when the model uniformly processes multitasking is solved. Based on the problems and the