CN-119206552-B - Unmanned aerial vehicle inspection method and device based on missing cross-modal data type incremental learning
Abstract
The invention discloses an unmanned aerial vehicle inspection method and device based on missing cross-modal data type incremental learning, wherein the method comprises the steps of complementing missing data collected by an unmanned aerial vehicle; preprocessing the completed data to obtain the characteristics of each mode hidden in the data, mixing the characteristic channels of different modes in the input characteristic diagram, distributing a set of convolution kernels to each group in the mixed input characteristic diagram, carrying out group convolution, then applying self-adaptive average pooling to the characteristic diagram to obtain a final output characteristic diagram, flattening the output characteristic diagram, inputting the output characteristic diagram into a full-connection layer, applying an activation function to introduce nonlinearity, generating a final classification result, then applying a loss function to calculate loss, using a gradient descent method to update model parameters, deploying the unmanned aerial vehicle to the unmanned aerial vehicle after model training is completed, and identifying scenes and entities by the unmanned aerial vehicle. The apparatus includes a processor and a memory. The invention improves the recognition and analysis capability of the unmanned aerial vehicle in a complex environment.
Inventors
- YAO XINJIE
- WANG YU
- ZHU PENGFEI
- LI WEIHAO
- ZHAO RUIPU
- HU QINGHUA
Assignees
- 天津大学
Dates
- Publication Date
- 20260508
- Application Date
- 20240925
Claims (7)
- 1. An unmanned aerial vehicle inspection method based on missing cross-modal data type incremental learning is characterized by comprising the following steps: Supplementing the missing data collected by the unmanned aerial vehicle; preprocessing the data after completion to obtain the characteristics of each mode hidden in the data; Distributing a set of convolution kernels to each group in the mixed input feature map, carrying out group convolution, and then, applying self-adaptive average pooling to the feature map to obtain a final output feature map; Flattening the output feature map, inputting the output feature map into a full-connection layer, applying an activation function to introduce nonlinearity, generating a final classification result, then applying a loss function to calculate loss, and using a gradient descent method to update model parameters; The step of mixing the characteristic channels of different modes in the input characteristic diagram comprises the following steps: Dividing the features from each mode in the input feature map into different channels, uniformly grouping the feature channels from each mode, selecting a channel from each group, and mixing with a corresponding group of another mode; The channel mix assigns channels into different predefined groups and reassembles feature channels from different modalities, the channel mix being defined as: ; ; Wherein, the The connection is represented by a representation of the connection, And Representing feature groups extracted from two different modalities, feature channels are equally divided into different groups , Represents the first The number of the scenes in which the video is displayed, Representing a different modality of the light emitted by the light source, Is the number of groups of the optical fiber, Is the number of channels to be processed, And Representing channel mixing operation of the feature groups extracted from the two different modes; Is shown in the first In the first scene Feature channel of the individual modalities The elements.
- 2. The unmanned aerial vehicle inspection method based on missing cross-modal data type incremental learning of claim 1, wherein the loss function consists of two parts, and truly classifies loss And new class reservation loss , The true classification loss For measuring the difference between the real label and the model predictive output; The new class reservation loss The virtual class and the virtual sample are used for pre-occupying the new class which is not learned in the embedded space, the distribution of the embedded space is adjusted, and the inspection recognition model in the unmanned aerial vehicle is helped to maintain the memory of the old class when learning the new class.
- 3. The unmanned aerial vehicle inspection method based on missing cross-modal data type incremental learning of claim 2, wherein the step of complementing missing data collected by the unmanned aerial vehicle is to complement missing data collected by a sensor by using a masking self-encoder in a training process.
- 4. The unmanned aerial vehicle inspection method based on missing cross-modal data type incremental learning of claim 2, wherein the virtual samples are: Randomly selecting two samples in a training set And And corresponding labels thereof And The two samples are respectively input into a neural network and propagated forward to a selected hidden layer, and the selected hidden layer is provided with a function of And The outputs of (a) are respectively And Output at hidden layer And Interpolation is performed to generate a new internal representation : Wherein Is an interpolation coefficient, and corresponding interpolation is carried out on the corresponding label: New representation obtained by interpolation Continuing forward propagation through the remainder of the neural network, resulting in a final output.
- 5. The unmanned aerial vehicle inspection method based on missing cross-modal data type incremental learning of claim 4, wherein the virtual loss corresponding to the sample is: ; Wherein, the Is a virtual category label that is displayed in a virtual category, Is a pseudo tag in the existing category, Representation by The new class space reserved by the function, And For avoiding over-compression of the old class space, Representing network pair input samples The characteristic representation that is generated is a representation of the feature, Representing a characteristic representation generated by the network for the blended samples, The function is defined as: ; Wherein, the Representing Hadamard Ma Chengji for multiplying the feature vector with the element-by-element complement of a one-hot encoded vector Tags representing currently known categories, the remaining category positions being 0, the current category position being 1, by In operation, the position 0 of the known category, the position 1 of the unknown category, and the objective function are finally defined as: ; Wherein, the Representing the basis loss function calculated by cosine similarity, Is a balance factor for adjusting Is used for the relative importance of the (a) to the (b), The raw input data representing the model is represented, Representation and input Corresponding real class labels.
- 6. An unmanned aerial vehicle inspection device based on missing cross-modal data type incremental learning, comprising a processor and a memory, wherein program instructions are stored in the memory, and wherein the processor invokes the program instructions stored in the memory to cause the device to perform the method of any one of claims 1-5.
- 7. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1-5.
Description
Unmanned aerial vehicle inspection method and device based on missing cross-modal data type incremental learning Technical Field The invention relates to the field of unmanned aerial vehicle inspection in intelligent unmanned systems, in particular to an unmanned aerial vehicle inspection method and device based on missing cross-modal data type incremental learning. Background The unmanned aerial vehicle inspection system mainly implements key infrastructure, such as automatic monitoring and inspection of power lines, oil and gas pipelines, railways and bridges through unmanned aerial vehicles. Such systems are also widely used in agriculture, forest monitoring, and city management. The method has the main advantages of improving the working efficiency, reducing the operation cost and reducing the personnel safety risk. When the unmanned aerial vehicle executes the inspection task, data needs to be collected and analyzed to make an automatic decision. During the daytime, the unmanned aerial vehicle captures clear images for analysis by a high-definition camera, while during the night, the unmanned aerial vehicle usually relies on an infrared camera to process images acquired due to insufficient illumination. Current automatic detection models primarily process images captured by high definition or infrared cameras, which model training deep neural networks based on collected data during a training phase, and then deploy these networks to analyze the newly acquired images to evaluate the current situation. In addition, in actual operation, some sensing modes may not provide data due to technical faults, environmental interference or other external factors, even if the modes are available, the sensing modes may also be lack of data due to limitation of sensor coverage or full view of a scene cannot be captured due to limitation of the sensor itself. Cross-modal learning focuses on learning and reasoning from associations between different modalities, enhancing the perception and understanding capabilities of the model by supplementing and sharing information with each other through data of the different modalities. In this learning category, the model fuses information of different sensing modalities to improve understanding and decision making capabilities of the environment. Incremental learning enables the model to maintain memory of existing data while absorbing new data. However, cross-modal learning relies on information complementarity between different modalities to improve the performance of the model. When the data of a certain mode is missing, the model cannot fully utilize the characteristics of the mode, so that the information is incomplete, and the overall performance of the model is reduced. Data loss may also lead to data distribution imbalance among modalities, resulting in model training that is entirely biased toward one of the modalities. In incremental learning, the model needs to keep memory of old data while continuously learning new data. Missing data exacerbates the forgetfulness phenomenon because missing data may contain key features, making it difficult for the model to maintain memory of these features during incremental learning. In cross-modal incremental learning, the problem of modal forgetting is an important challenge. In the incremental learning process, since different feature distributions and learning dynamics may exist between different modalities, the model may forget previously learned knowledge about the different modalities when learning the new modality. Furthermore, in cross-modal learning, each modality typically provides a different type of information. Ideally, the model should extract and utilize information from each modality in a balanced manner. However, in practical application, a certain mode may gradually take the dominant role, and a problem of mode competition forgetting occurs. Such advantages may lead to models that rely excessively on information from one modality, while ignoring useful information provided by other modalities, thereby affecting overall learning and decision quality. In order to solve the problem of data loss, various strategies including a difference compensation method, a difference method, a matrix decomposition method and the like are proposed so as to fill in the data and reduce the influence caused by the data loss. Although the strategies can solve the problems to a certain extent, the strategies often adopt basic statistics to fill data, neglect the correlation among the data, or have large calculation cost and high calculation complexity, and in order to solve the problem of mode competition forgetting, the existing method mainly adopts the strategies of introducing a mode balancing mechanism, enhancing the data and resampling. However, there is also a problem that resource overhead is high and each mode weight setting is subjective. Therefore, how to effectively solve the problem of data missing and the problem of modal c