Search

CN-121997277-A - Construction and application method, system, terminal and medium of multi-mode wireless base model based on dual-encoder cross-mode injection mechanism

CN121997277ACN 121997277 ACN121997277 ACN 121997277ACN-121997277-A

Abstract

The invention discloses a method, a system, a terminal and a medium for constructing and applying a multi-mode wireless basic model based on a dual-encoder cross-mode injection mechanism, wherein the method comprises the steps of constructing a dual-encoder structure, wherein the dual-encoder structure comprises a CSI encoder and a point cloud encoder; the method comprises the steps of acquiring multi-mode data, carrying out data blocking and position coding, extracting first features through a CSI encoder and a point cloud encoder, extracting second features through an auxiliary encoder, carrying out cross injection fusion on features of different modes through a cross-mode attention mechanism, outputting multi-mode universal characterization, carrying out pre-training in a combined training mode to obtain a multi-mode wireless basic model, selecting the universal characterization according to the type of a downstream task in an reasoning stage, and accessing the universal characterization into the downstream task model for reasoning. The multi-mode wireless basic model trained based on the invention can promote a plurality of extensive communication and downstream perception tasks including channel estimation, channel prediction, target positioning, electromagnetic map, indoor perception and the like.

Inventors

  • JING LIWEN
  • SUN LI
  • ZHENG MENGFAN
  • XU LEIYANG
  • LONG YUNHAN
  • WEI ZICHAO
  • SHI YUXUAN
  • YANG TINGTING

Assignees

  • 鹏城实验室

Dates

Publication Date
20260508
Application Date
20260408

Claims (10)

  1. 1. The method for constructing and applying the multi-mode wireless base model based on the dual-encoder cross-mode injection mechanism is characterized by comprising the following steps: Constructing a double-encoder structure, wherein the double-encoder structure comprises a CSI encoder taking channel state information as a main mode and a point cloud encoder taking point cloud data as a main mode; Acquiring multi-mode data with time sampling aligned, and performing data block and position coding on the multi-mode data, wherein the multi-mode data comprises channel state data or point cloud data serving as a main mode and sensor data serving as an auxiliary mode; Respectively extracting first features of corresponding main modes through the CSI encoder and the point cloud encoder, extracting second features of the auxiliary modes through the auxiliary encoder, performing cross-mode injection in the CSI encoder and the point cloud encoder, performing cross-injection fusion on the features of different modes through a cross-mode attention mechanism, and outputting multi-mode general characterization; The multi-mode general characterization is utilized, and a combined training mode combining autoregressive training and task related training is adopted to pretrain the model, so that a multi-mode wireless basic model is obtained; In the reasoning stage, the corresponding general representation output by the CSI encoder or the point cloud encoder is selected through a task switch according to the type of the downstream task, and the corresponding general representation is accessed into a downstream task model for reasoning.
  2. 2. The method for constructing and applying a multi-mode wireless base model based on a dual-encoder cross-mode injection mechanism according to claim 1, wherein the sensor data of the auxiliary mode includes RGB image data, the corresponding auxiliary encoder is a visual encoder, and the second feature is a visual feature.
  3. 3. The method for constructing and applying a multi-mode wireless base model based on a dual-encoder cross-mode injection mechanism according to claim 1, wherein the CSI encoder and the point cloud encoder are both based on a transducer architecture, and the CSI encoder and the point cloud encoder respectively extract first features of corresponding main modes, including: Front by the CSI encoder and the point cloud encoder The layer's transducer module extracts the first characteristic of corresponding main modality, and the interior of the transducer module adopts self-attention mechanism and feedforward neural network.
  4. 4. The method for constructing and applying a multi-mode wireless base model based on a dual-encoder cross-mode injection mechanism according to claim 3, wherein at the CSI encoder and the point cloud encoder And in the layer and the subsequent transducer modules, a cross-mode attention mechanism and a feedforward neural network are internally adopted, wherein the CSI encoder injects features from point cloud data and auxiliary modes, and the point cloud encoder injects features from channel state data and auxiliary modes.
  5. 5. The method for constructing and applying the multi-mode wireless base model based on the dual-encoder cross-mode injection mechanism according to claim 4, wherein bidirectional cross-mode transmission exists between the corresponding convectors of the two main modes, the auxiliary modes are simultaneously provided for the convectors of all the main modes, and the auxiliary modes are transmitted unidirectionally from the main modes.
  6. 6. The method for constructing and applying a multi-mode wireless base model based on a dual-encoder cross-mode injection mechanism according to claim 1, wherein the loss function of the joint training is: Wherein, the In order to account for the total loss, To learn the autoregressive loss based on the autoregressive head output calculation, To output calculated supervised learning task related losses based on downstream task heads, Is a weight super parameter.
  7. 7. The method for constructing and applying a multi-mode wireless base model based on a dual-encoder cross-mode injection mechanism according to claim 1, wherein in an inference phase, selecting, by a task switch, a corresponding generic representation of the CSI encoder or point cloud encoder output according to a type of a downstream task, comprises: selecting a generic representation of the CSI encoder output when performing a communication task; When performing a perception task, a generic representation of the point cloud encoder output is selected.
  8. 8. A system for constructing and applying a multi-mode wireless base model based on a dual-encoder cross-mode injection mechanism, the system being configured to implement the method for constructing and applying a multi-mode wireless base model based on a dual-encoder cross-mode injection mechanism as set forth in any one of claims 1 to 7, the system comprising: The dual-encoder construction module is used for constructing a dual-encoder structure, and the dual-encoder structure comprises a CSI encoder taking channel state information as a main mode and a point cloud encoder taking point cloud data as a main mode; The multi-mode data processing module is used for acquiring multi-mode data aligned in time sampling, and carrying out data block and position coding on the multi-mode data, wherein the multi-mode data comprises channel state data or point cloud data serving as a main mode and sensor data serving as an auxiliary mode; the feature extraction and fusion module is used for respectively extracting first features of corresponding main modes through the CSI encoder and the point cloud encoder, extracting second features of the auxiliary modes through the auxiliary encoder, performing cross-mode injection in the CSI encoder and the point cloud encoder, performing cross-injection fusion on the features of different modes through a cross-mode attention mechanism, and outputting multi-mode general characterization; the combined training module is used for pre-training the model by utilizing the multi-mode general representation and adopting a combined training mode of combining autoregressive training and task related training to obtain a multi-mode wireless basic model; And the reasoning application module is used for selecting the corresponding general representation output by the CSI encoder or the point cloud encoder according to the type of the downstream task through the task switch in the reasoning stage, and accessing the corresponding general representation into a downstream task model for reasoning.
  9. 9. A terminal, characterized in that the terminal comprises a memory, a processor and a multi-mode wireless base model building and application program based on a dual-encoder cross-mode injection mechanism, wherein the multi-mode wireless base model building and application program is stored in the memory and can run on the processor, and the steps of the multi-mode wireless base model building and application method based on the dual-encoder cross-mode injection mechanism are realized when the processor executes the multi-mode wireless base model building and application program based on the dual-encoder cross-mode injection mechanism.
  10. 10. A computer readable storage medium, wherein the computer readable storage medium stores thereon a construction and application program of a multi-mode wireless base model based on a dual-encoder cross-mode injection mechanism, and the construction and application program of the multi-mode wireless base model based on the dual-encoder cross-mode injection mechanism implements the steps of the construction and application method of the multi-mode wireless base model based on the dual-encoder cross-mode injection mechanism according to any one of claims 1 to 7 on the computer readable storage medium.

Description

Construction and application method, system, terminal and medium of multi-mode wireless base model based on dual-encoder cross-mode injection mechanism Technical Field The invention relates to the technical field of intersection of artificial intelligence and wireless communication, in particular to a method, a system, a terminal and a medium for constructing and applying a multi-mode wireless basic model based on a dual-encoder cross-mode injection mechanism. Background With the evolution of the Edge artificial intelligence (AI-Edge) technology, wireless networks are evolving from single communication networks to communication and perception integrated networks. In a wide range of scenarios, the system is required to perform high quality wireless signal communication tasks, and also to have accurate sensing capabilities. The existing wireless channel model or basic model is usually designed aiming at a single mode, and the performance difference of a communication task and a perception task is difficult to consider on the bottom layer representation, so that the generalization of the model is limited, and the requirements of AI-Edge on efficient and general feature representation cannot be met. Therefore, the prior art has drawbacks. Disclosure of Invention Aiming at the defects in the prior art, the invention provides a method, a system, a terminal and a medium for constructing and applying a multi-mode wireless base model based on a dual-encoder cross-mode injection mechanism, and the technical scheme adopted by the invention is as follows: in a first aspect, the present invention provides a method for constructing and applying a multi-mode wireless base model based on a dual-encoder cross-mode injection mechanism, the method comprising: Constructing a double-encoder structure, wherein the double-encoder structure comprises a CSI encoder taking channel state information as a main mode and a point cloud encoder taking point cloud data as a main mode; Acquiring multi-mode data with time sampling aligned, and performing data block and position coding on the multi-mode data, wherein the multi-mode data comprises channel state data or point cloud data serving as a main mode and sensor data serving as an auxiliary mode; Respectively extracting first features of corresponding main modes through the CSI encoder and the point cloud encoder, extracting second features of the auxiliary modes through the auxiliary encoder, performing cross-mode injection in the CSI encoder and the point cloud encoder, performing cross-injection fusion on the features of different modes through a cross-mode attention mechanism, and outputting multi-mode general characterization; The multi-mode general characterization is utilized, and a combined training mode combining autoregressive training and task related training is adopted to pretrain the model, so that a multi-mode wireless basic model is obtained; In the reasoning stage, the corresponding general representation output by the CSI encoder or the point cloud encoder is selected through a task switch according to the type of the downstream task, and the corresponding general representation is accessed into a downstream task model for reasoning. In one implementation, the sensor data of the auxiliary modality includes RGB image data, the corresponding auxiliary encoder is a visual encoder, and the second feature is a visual feature. In one implementation, the CSI encoder and the point cloud encoder are both based on a transducer architecture, and the CSI encoder and the point cloud encoder respectively extract first features of corresponding main modes, including: Front by the CSI encoder and the point cloud encoder The layer's transducer module extracts the first characteristic of corresponding main modality, and the interior of the transducer module adopts self-attention mechanism and feedforward neural network. In one implementation, at the CSI encoder and the point cloud encoderAnd in the layer and the subsequent transducer modules, a cross-mode attention mechanism and a feedforward neural network are internally adopted, wherein the CSI encoder injects features from point cloud data and auxiliary modes, and the point cloud encoder injects features from channel state data and auxiliary modes. In one implementation, there is bi-directional trans-modal transfer between the corresponding transgenes of the two main modes, the auxiliary modes are provided to the transgenes of all the main modes simultaneously, and the auxiliary modes are unidirectional transfer to the main modes. In one implementation, the loss function of the joint training is: Wherein, the In order to account for the total loss,To learn the autoregressive loss based on the autoregressive head output calculation,To output calculated supervised learning task related losses based on downstream task heads,Is a weight super parameter. In one implementation, in the reasoning phase, selecting, by the task switch, a corr