CN-119131471-B - Unmanned aerial vehicle inspection method and system based on cross-modal data type incremental learning

CN119131471BCN 119131471 BCN119131471 BCN 119131471BCN-119131471-B

Abstract

The invention discloses an unmanned aerial vehicle inspection method and system based on cross-modal data increment learning, which comprises the steps of splicing and recombining characteristics of various modes through a cross-modal characteristic mixing fusion module, distributing a set of independent convolution kernels to each group of processed input characteristic images, carrying out grouped convolution to finally generate a complete characteristic image, for each pooling window in the complete characteristic image, repeatedly carrying out the process on all areas according to an average value of all elements in a self-adaptive average pooling calculation window as an element at a corresponding position of an output characteristic image to obtain a final output characteristic image, flattening the output characteristic image obtained after being processed by the self-adaptive average pooling module, inputting the output characteristic image into a full-connection layer to extract and combine characteristics to generate final classified output, then updating model parameters according to gradient of a loss function, deploying a trained model onto an unmanned aerial vehicle, collecting visible light and thermal imaging images in real time, extracting the characteristics, and carrying out recognition and prediction through the model.

Inventors

YAO XINJIE
WANG YU
ZHU PENGFEI
LI WEIHAO
ZHAO RUIPU
HU QINGHUA

Assignees

天津大学

Dates

Publication Date: 20260508
Application Date: 20240822

Claims (6)

1. An unmanned aerial vehicle inspection method based on cross-modal data type incremental learning is characterized by comprising the following steps: The method comprises the steps of performing splicing recombination on the characteristics of each mode through a cross-mode characteristic mixing fusion module, distributing a set of independent convolution kernels to each group of processed input characteristic images, performing grouping convolution, and finally generating a complete characteristic image; For each pooling window in the complete feature map, calculating the average value of all elements in the window according to the self-adaptive average pooling, taking the average value as the element of the corresponding position of the output feature map, and repeatedly carrying out the process on all areas to obtain the final output feature map; Flattening the output characteristic diagram obtained after the processing of the self-adaptive average pooling module, inputting the flattened output characteristic diagram into a full-connection layer to extract and combine characteristics to generate final classified output, and updating model parameters according to the gradient of a loss function; the loss function comprises real tag calculation classification loss And unknown new class reservation loss , Real tag calculation classification loss The method is used for correctly classifying the current known category, measuring the difference between the model output and the real label, and ensuring that the model can accurately identify and classify the category contained in the training data; unknown new class reservation loss Reserving a part of learning space in the output space of the model, and calculating the prediction output of each category by the input data through the model during training; Combining the two loss weights to form a total loss; The loss function is: Loss function Learning the model from the labels by a known class, loss function Reserving a learning space for the unknown new class, making the learned known class representation more compact, the objective function is defined as: ; Wherein, the Is an enhanced embedding obtained from a cross-modal feature hybrid fusion module, Is a label which is a label of the type, Is a pseudo tag in the virtual category, Is the coefficient that balances the two losses; The mask function is defined as: ; Wherein, the The function being applied to the feature vector The feature vector represents an embedded representation derived from incremental learning, And Respectively representing the embedding of the two modes, Representing Hadamard Ma Chengji for multiplying a feature vector by an element-by-element complement of a one-hot encoded vector Tags representing currently known categories, the remaining category positions being 0, the current category position being 1, by The operation is that the position of the known class is set to 0, and the unknown class And the other position is 1.
2. The unmanned aerial vehicle inspection method based on cross-modal data type incremental learning according to claim 1, wherein the splicing and reorganizing the features of each mode through the cross-modal feature mixing and fusing module is specifically as follows: the features from each mode in the feature map are divided into a plurality of groups, each group comprises a certain number of channels, a part of channels are selected from a specific group of one mode, and splicing and recombination are carried out on the channels of the corresponding group in the other mode.
3. The unmanned aerial vehicle inspection method based on cross-modal data type incremental learning of claim 1, wherein the cross-modal feature hybrid fusion module is: In the batch normalization layer, the features of each channel are normalized, adjusted by scaling and translation, larger scale factors The value represents that the channel has higher activity and importance in the model learning process and is close to zero scale factor The value indicates that the channel information is redundant, and the influence on output is small; The method comprises the steps of dividing input features of visible light and infrared light modes into a plurality of predefined groups according to channels, performing special processing, selecting the most important or representative channels from the specific groups of one mode by using a dynamic selection mechanism in a mixed fusion stage, combining the channels with the channels of the corresponding groups in the other mode, and automatically adjusting a channel recombination strategy according to data features and task requirements in a training process.
4. The unmanned aerial vehicle inspection method based on cross-modal data class incremental learning of claim 3, wherein the cross-modal feature hybrid fusion module is defined as: ; ; Wherein, the The connection is represented by a representation of the connection, And Representing feature sets extracted from two different modalities, And The features of (a) include each other.
5. A drone inspection system based on cross-modal data class delta learning, the system comprising a processor and a memory, the memory having stored therein program instructions, the processor invoking the program instructions stored in the memory to cause an apparatus to perform the method of any of claims 1-4.
6. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1-4.

Description

Unmanned aerial vehicle inspection method and system based on cross-modal data type incremental learning Technical Field The invention relates to the field of unmanned aerial vehicle inspection in intelligent unmanned systems, in particular to an unmanned aerial vehicle inspection method and system based on cross-modal data type incremental learning. Background Unmanned aerial vehicle inspection systems mainly utilize unmanned aerial vehicles to carry out automated monitoring and inspection of facilities or areas. The system is generally applied to maintenance work of key infrastructures such as power lines, oil and gas pipelines, railways, bridges and the like, and is applied to the fields of agriculture, forest monitoring, city management and the like. The unmanned aerial vehicle inspection system has the core advantages of improving efficiency, reducing cost and reducing personnel safety risks. Unmanned aerial vehicle inspection needs to collect and analyze input data to make an automatic decision, and a visible light image and an infrared image are two common data and have certain complementarity. The visible light image can provide more abundant information during daytime, and the infrared image can provide more abundant information during night due to insufficient illumination conditions. Therefore, how to design the unmanned aerial vehicle inspection research for the cross-modal data becomes a problem to be solved. The existing unmanned aerial vehicle inspection model is used for collecting high-definition images or infrared images captured by a high-definition camera or an infrared camera for processing, training a deep neural network model based on a cross-modal image dataset, and carrying out reasoning and prediction on newly collected cross-modal images by using the model so as to judge the category of the newly collected cross-modal images. Although the method can achieve good performance, the unmanned aerial vehicle inspection model cannot be updated once deployed. In general, the type of inspection target gradually changes with the development of society, and a large number of new classes will continuously emerge in the form of various modalities. The prior method can only identify the learned old type cross-modal targets, so that the prior method is difficult to be practically used. The cross-modal incremental learning is a method for enabling a model to learn new tasks on the basis of existing cross-modal tasks, so that the method has the capability of simultaneously processing new and old cross-modal tasks. In cross-modal incremental learning, models improve understanding and generalization capabilities by continuously integrating new information from different sensing modalities. However, because the cross-modal data distribution of the new task is greatly different from that of the old task, the deep neural network may change important parameters related to the old task when learning the new task, so that the recognition capability of the model on the old task is obviously reduced. This phenomenon is called catastrophic forgetfulness, and the parameters of the model tend to be new tasks, affecting the recognition and processing power of the old tasks. In order to solve the problem, the prior methods design some methods for solving the forgetting of knowledge, such as regularization, adding a replay mechanism, dynamic architecture expansion and the like, but only pay attention to the maintenance of single-mode knowledge to a large extent. In practical applications, while these strategies help to alleviate the problem of unimodal catastrophic forgetfulness, they often ignore challenges unique to cross-modal data. Therefore, how to solve the problem of model forgetting of old tasks in the cross-model incremental learning is a key factor of whether the unmanned aerial vehicle inspection model can be effectively deployed on a large scale. Disclosure of Invention The invention provides an unmanned aerial vehicle inspection method and system based on cross-modal data type incremental learning, which considers the problem of modal forgetting in the cross-modal incremental learning of unmanned aerial vehicle inspection tasks, designs a cross-modal data type incremental learning based method, and uses a cross-modal feature information exchange mechanism of block diagonal structure sparsity on a channel scale and construction of virtual categories, a constraint model incrementally introduces new categories under the condition of not losing previous learning knowledge, and uses enhancement features of known categories to reserve learning space for the cross-modal new types of data, so that interference between the cross-modal new and old tasks is effectively relieved, thereby improving the cross-modal incremental learning capacity of the unmanned aerial vehicle inspection model, greatly increasing the robustness and generalization of unmanned aerial vehicle inspection, and being described in detail b