CN-122020536-A - Multi-mode feature fusion method and device

CN122020536ACN 122020536 ACN122020536 ACN 122020536ACN-122020536-A

Abstract

The application discloses a fusion method and device of multi-modal features, wherein the fusion method comprises the steps of determining corresponding weights for each modal feature of the multi-modal features, expanding the dimensions of the corresponding weights to be the same as those of the modal features for each modal feature of the multi-modal features to determine expanded weights, determining corresponding weighted modal features based on the corresponding expanded weights for each modal feature of the multi-modal features, and determining fusion features of the multi-modal features based on the determined weighted modal features. Thereby, enhanced interpretive fusion of multimodal features is achieved.

Inventors

SUN YUFEI
Wang Qianluan
WANG ZHUOZHENG
ZHANG AIMIN
LI CHUANQI

Assignees

孙雨飞

Dates

Publication Date: 20260512
Application Date: 20260127

Claims (10)

1. A method for fusing multi-modal features, the method comprising: Determining a corresponding weight for each of the multi-modal features; For each of the multi-modal features, expanding the dimension of the corresponding weight to be the same as the dimension of the modal feature to determine an expanded weight; For each of the multi-modal features, determining a corresponding weighted modal feature based on its corresponding expanded weight; and determining fusion characteristics of the multi-modal characteristics based on the determined weighted modal characteristics.
2. The fusion method of claim 1, wherein determining a corresponding weight for each of the multi-modal features comprises: Executing global average pooling operation on the modal characteristics, and determining global characteristic vectors corresponding to the modal characteristics; Based on the determined global feature vector, a Sigmoid function is combined to determine the weight corresponding to the modal feature.
3. The fusion method of claim 1, wherein determining a corresponding weight for each of the multi-modal features comprises: and determining the weight corresponding to the modal feature based on the corresponding relation between the modal feature and the weight.
4. The fusion method of claim 1, wherein for each of the multi-modal features, expanding the dimension of the corresponding weight to be the same as the dimension of the modal feature to determine an expanded weight comprises: for the modal feature, based on a tensor broadcast mechanism, extending the dimension of the corresponding weight to be the same as the dimension of the modal feature to determine an extended weight.
5. A fusion device for multimodal features, the fusion device comprising: The weight determining module is used for determining corresponding weights for each modal feature in the multi-modal features; an expansion weight determining module, configured to, for each of the multi-modal features, expand a dimension of a corresponding weight to be the same as a dimension of the modal feature to determine an expansion weight; a weighted modal feature determination module, configured to determine, for each modal feature of the multi-modal features, a corresponding weighted modal feature based on its corresponding expansion weight; and the fusion characteristic determining module is used for determining fusion characteristics of the multi-mode characteristics based on the determined weighted modal characteristics.
6. The fusion device of claim 5, wherein the weight determination module determines, for each of the multi-modal features, a corresponding weight comprising: Executing global average pooling operation on the modal characteristics, and determining global characteristic vectors corresponding to the modal characteristics; Based on the determined global feature vector, a Sigmoid function is combined to determine the weight corresponding to the modal feature.
7. The fusion device of claim 5, wherein the weight determination module determines, for each of the multi-modal features, a corresponding weight comprising: and determining the weight corresponding to the modal feature based on the corresponding relation between the modal feature and the weight.
8. The fusion device of claim 5, wherein the expanded weight determination module expands, for each of the multi-modal features, a dimension of a corresponding weight to be the same as a dimension of the modal feature to determine an expanded weight, comprising: for the modal feature, based on a tensor broadcast mechanism, extending the dimension of the corresponding weight to be the same as the dimension of the modal feature to determine an extended weight.
9. A machine-readable storage medium having stored thereon instructions for causing a machine to perform the fusion method of any of claims 1-4.
10. An electronic device, characterized in that, the electronic device includes: A processor; a memory for storing the processor-executable instructions; The processor is configured to read the executable instructions from the memory and execute the executable instructions to implement the fusion method of any one of claims 1-4.

Description

Multi-mode feature fusion method and device Technical Field The application relates to the field of multi-mode feature fusion, in particular to a multi-mode feature fusion method and device. Background Multimodal feature fusion is widely used in the field of medical image segmentation. The characteristic fusion mode of the existing nnU-Net model and the mainstream multi-mode segmentation network is mainly implicit fusion, and the core scheme is that input data of different modes are directly combined in a tensor splicing or simple adding mode, and then the input data are integrally sent to a convolution module for deep characteristic extraction. The method does not need to independently design fusion logic, is simple to realize, and is a mainstream choice in the current industry and academia. For the existing implicit fusion technology, the mode weight distribution and deep feature extraction are coupled, the contribution degree of each mode to the final segmentation result cannot be explicitly reflected by the network, and the method lacks of interpretability and is not beneficial to clinical landing and subsequent optimization. Therefore, how to enhance the interpretability of multi-modal feature fusion is a technical problem that needs to be solved in the art. Disclosure of Invention In view of the above, the present application provides a method and apparatus for fusing multi-modal features to enhance the interpretability of multi-modal feature fusion. In a first aspect, the application provides a fusion method of multi-modal features, the fusion method comprising determining a corresponding weight for each of the multi-modal features, expanding the dimension of the corresponding weight to be the same as the dimension of the modal features for each of the multi-modal features to determine an expanded weight, determining a corresponding weighted modal feature based on the corresponding expanded weight for each of the multi-modal features, and determining the fusion feature of the multi-modal features based on the determined weighted modal features. Optionally, for each of the multi-modal features, determining the corresponding weight includes performing a global averaging pooling operation on the modal features, determining a global feature vector corresponding to the modal features, and determining the weight corresponding to the modal features based on the determined global feature vector in combination with a Sigmoid function. Optionally, for each of the multi-modal features, determining the corresponding weight includes determining the weight corresponding to the modal feature based on the correspondence of the modal feature to the weight. Optionally, for each of the multi-modal features, extending the dimension of the corresponding weight to be the same as the dimension of the modal feature to determine an extended weight includes, for the modal feature, extending the dimension of the corresponding weight to be the same as the dimension of the modal feature based on a tensor broadcast mechanism to determine an extended weight. The application further provides a multi-modal feature fusion device which comprises a weight determining module, an expansion weight determining module, a weighting modal feature determining module and a fusion feature determining module, wherein the weight determining module is used for determining corresponding weight for each modal feature in the multi-modal features, the expansion weight determining module is used for expanding the dimension of the corresponding weight to be the same as the dimension of the modal feature for determining the expansion weight for each modal feature in the multi-modal features, the weighting modal feature determining module is used for determining the corresponding weighting modal feature based on the corresponding expansion weight for each modal feature in the multi-modal features, and the fusion feature determining module is used for determining the fusion feature of the multi-modal features based on the determined weighting modal features. Optionally, the weight determining module determines a corresponding weight for each of the multi-modal features, including performing a global averaging pooling operation on the modal features, determining a global feature vector corresponding to the modal features, and determining the weight corresponding to the modal features based on the determined global feature vector in combination with a Sigmoid function. Optionally, the weight determination module determines, for each of the multi-modal features, a corresponding weight including determining a weight corresponding to the modal feature based on a correspondence of the modal feature to the weight. Optionally, the expansion weight determination module expands, for each of the multimodal features, the dimension of the corresponding weight to be the same as the dimension of the modality feature to determine an expansion weight, including expanding, f