CN-122020550-A - Bird recognition method and system based on dynamic multi-mode fusion and self-evolution lightweight model

CN122020550ACN 122020550 ACN122020550 ACN 122020550ACN-122020550-A

Abstract

The invention discloses a bird identification method and a bird identification system based on a dynamic multi-mode fusion and self-evolution lightweight model, and belongs to the technical field of bird identification. The method comprises the steps of collecting multi-mode data, preprocessing the multi-mode data to obtain a voiceprint feature map and an image tensor, respectively extracting a voiceprint high-level semantic feature vector and an image high-level semantic feature vector from the voiceprint feature map and the image tensor based on a lightweight convolutional neural network, fusing the voiceprint high-level semantic feature vector and the image high-level semantic feature vector through a dynamic fusion module to obtain a fusion feature vector, constructing a lightweight recognition model, training the lightweight recognition model through a self-evolution module by adopting a hybrid evolution strategy, deploying the trained lightweight recognition model to an application terminal, and carrying out bird recognition and result output on the application terminal through the trained lightweight recognition model. According to the bird identification method, the accuracy, the robustness and the self-adaptation capability of bird identification in a complex environment are improved by dynamically fusing the multi-modal information and the self-evolution lightweight model.

Inventors

YANG DIANWEN
XIANG XUYU

Assignees

中南林业科技大学

Dates

Publication Date: 20260512
Application Date: 20260202

Claims (8)

1. A bird identification method based on a dynamic multi-mode fusion and self-evolution lightweight model is characterized by comprising the following steps: Step 1, acquiring multi-mode data, and preprocessing the multi-mode data to obtain a voiceprint feature map and an image tensor, wherein the multi-mode data set comprises original audio data containing bird voiceprints and original image data containing bird targets; Step 2, respectively extracting voiceprint high-level semantic feature vectors and image high-level semantic feature vectors from the voiceprint feature images and the image tensors based on a lightweight convolutional neural network, and fusing the voiceprint high-level semantic feature vectors and the image high-level semantic feature vectors through a dynamic fusion module to obtain fusion feature vectors; Step 3, constructing a light-weight recognition model, wherein the light-weight recognition model is integrated with a self-evolution module, the self-evolution module comprises a meta controller and a model structure search space, and the light-weight recognition model is trained by the self-evolution module by adopting a hybrid evolution strategy; step 4, deploying the trained lightweight identification model to an application terminal; and 5, performing bird recognition and result output on the application terminal through the trained lightweight recognition model.
2. The bird recognition method based on the dynamic multi-mode fusion and self-evolution lightweight model according to claim 1, wherein in step 1, original audio data containing bird voiceprints are preprocessed to obtain voiceprint feature images, specifically: and carrying out pre-emphasis, framing, windowing and short-time Fourier transformation processing on the original audio data containing the bird voiceprint, and calculating a Mel spectrogram or Mel frequency cepstrum coefficient of the original audio data to obtain a voiceprint feature map.
3. The bird recognition method based on the dynamic multi-mode fusion and self-evolution lightweight model according to claim 2, wherein in step 1, the original image data containing the bird target is preprocessed to obtain an image tensor, specifically: And performing size normalization, center clipping and color channel normalization processing on the original image data containing the bird target to obtain an image tensor.
4. The bird recognition method based on the dynamic multi-mode fusion and self-evolution lightweight model according to claim 3, wherein in the step 2, the dynamic fusion module is used for fusing the bird recognition method to obtain a fusion feature vector, specifically: Calculating the attention weight and the context vector of the voiceprint high-level semantic feature vector to the image high-level semantic feature vector and the attention weight and the context vector of the image high-level semantic feature vector to the voiceprint high-level semantic feature vector based on a cross-modal attention mechanism; Dynamically generating a fusion weight coefficient through an environment perception sub-network based on environment metadata acquired in real time; and carrying out weighted fusion on the attention weight and the context vector based on the fusion weight coefficient to obtain a final fusion feature vector.
5. The method for bird identification based on dynamic multi-modal fusion and self-evolution lightweight model according to claim 4, wherein in step 3, backbone network of the lightweight identification model is GhostNet.
6. The bird recognition method based on the dynamic multi-modal fusion and self-evolution lightweight model according to claim 5, wherein in step 3, the lightweight recognition model is trained by the self-evolution module by adopting a hybrid evolution strategy, specifically comprising the following steps: Performing model structure exploration in a model structure search space based on a genetic algorithm, coding partial structure configuration of a lightweight recognition model into individuals, calculating the fitness of each individual, and performing selection, crossing and mutation operations based on the fitness; Policy optimization is carried out based on reinforcement learning, decisions of the meta-controllers are regarded as actions, improvement of accuracy of the lightweight identification model on the verification set is regarded as rewards, and parameters of the meta-controllers are updated through a near-end policy optimization algorithm so as to maximize accumulated rewards; And constructing a multi-mode data set, and performing end-to-end training on the lightweight identification model based on the multi-mode data set, wherein the loss function is cross entropy loss.
7. The bird recognition method based on the dynamic multi-mode fusion and self-evolution lightweight model according to claim 6, wherein in step 4, the trained lightweight recognition model is deployed to an application terminal, specifically: deploying the trained lightweight identification model to an application terminal, monitoring the performance of the lightweight identification model in real time by the application terminal, triggering an evolution update flow of a self-evolution module if the average identification confidence of the lightweight identification model in a sliding event window is lower than a preset threshold or new marked data is received, updating and training the lightweight identification model, and redeploying the updated lightweight identification model to the application terminal.
8. A bird identification method based on a dynamic multi-modal fusion and self-evolution lightweight model, applied to the bird identification system based on the dynamic multi-modal fusion and self-evolution lightweight model as claimed in any one of claims 1 to 7, characterized by comprising: The data acquisition and preprocessing module is used for acquiring original audio data containing bird voiceprints and original image data containing bird targets, performing pre-emphasis, framing, windowing and short-time Fourier transform processing on the original audio data to obtain voiceprint feature images, and performing size normalization, center clipping and color channel normalization processing on the original image data to obtain image tensors; The dynamic fusion module is used for respectively extracting voiceprint high-level semantic feature vectors and image high-level semantic feature vectors from the voiceprint feature images and the image tensors based on the lightweight convolutional neural network, and fusing the feature vectors by adopting a cross-modal attention mechanism and an environment perception dynamic weight strategy to obtain fused feature vectors; the model training and self-evolution module is used for constructing a lightweight identification model, wherein the lightweight identification model is integrated with the self-evolution module, and the self-evolution module comprises a meta controller and a model structure search space; The deployment and update module is used for deploying the trained lightweight identification model to the application terminal, monitoring the performance of the lightweight identification model, and activating the self-evolution module to update, train and redeploy the model when the triggering condition is met; The identification execution module is used for processing the input multi-mode data through the deployed lightweight identification model at the application terminal and outputting bird identification results.

Description

Bird recognition method and system based on dynamic multi-mode fusion and self-evolution lightweight model Technical Field The invention relates to the technical field of bird identification, in particular to a bird identification method and a system based on a dynamic multi-modal fusion and self-evolution lightweight model. Background The automatic bird monitoring has important significance for biodiversity protection and ecological research. Current mainstream technology relies on image recognition or voiceprint recognition based on deep learning. Despite the rapid development, each faces inherent limitations: (1) The vulnerability of single mode is that the image recognition fails when the illumination is insufficient, the target is blocked and the distance is too far, and the accuracy of voiceprint recognition is drastically reduced when the environmental noise is high and multiple sound sources are mixed. Neither can deal with complex field environments independently. (2) The multi-mode fusion strategy is stiff, and the existing attempts are combined with the research of sound images, and static modes such as feature early splicing, late decision fusion and the like are mostly adopted. The method cannot be dynamically adjusted according to the real-time quality of each mode data in a specific scene, cannot maximize the complementary advantages of the multi-mode data, and has limited generalization capability in a real and changeable environment. (3) The model solidification contradicts with high deployment cost, namely, a high-precision model is complex in calculation and large in parameter quantity, and is difficult to deploy on the edge equipment with limited resources. Once deployed, the model cannot be updated, and cannot adapt to bird species differences in different geographical areas or bird behaviors changing with seasons, so that the system has short practical cycle and high maintenance cost. (4) The light weight and the recognition precision are difficult to be combined, namely, a general light weight model (such as MobileNet) is directly applied to a professional bird recognition task, and the characteristic discrimination is not enough due to insufficient capacity of the model, so that the requirement of the actual application on the high precision is difficult to be met. In view of the above, there is a need for a method and a system for bird identification based on dynamic multi-modal fusion and self-evolving lightweight model, which solve the above problems of the conventional methods. Disclosure of Invention The invention aims to provide a bird recognition method and a bird recognition system based on a dynamic multi-mode fusion and self-evolution lightweight model, which remarkably improve the accuracy, robustness and self-adaptation capability of bird recognition in a complex environment by dynamically fusing multi-mode information and the self-evolution lightweight model and simultaneously meet the requirement of low-power consumption real-time deployment of edge equipment. In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: A bird identification method based on a dynamic multi-modal fusion and self-evolution lightweight model comprises the following steps: Step 1, acquiring multi-mode data, and preprocessing the multi-mode data to obtain a voiceprint feature map and an image tensor, wherein the multi-mode data set comprises original audio data containing bird voiceprints and original image data containing bird targets; Step 2, respectively extracting voiceprint high-level semantic feature vectors and image high-level semantic feature vectors from the voiceprint feature images and the image tensors based on a lightweight convolutional neural network, and fusing the voiceprint high-level semantic feature vectors and the image high-level semantic feature vectors through a dynamic fusion module to obtain fusion feature vectors; Step 3, constructing a light-weight recognition model, wherein the light-weight recognition model is integrated with a self-evolution module, the self-evolution module comprises a meta controller and a model structure search space, and the light-weight recognition model is trained by the self-evolution module by adopting a hybrid evolution strategy; step 4, deploying the trained lightweight identification model to an application terminal; and 5, performing bird recognition and result output on the application terminal through the trained lightweight recognition model. Further, in step 1, preprocessing original audio data containing bird voiceprints to obtain a voiceprint feature map, which specifically includes: and carrying out pre-emphasis, framing, windowing and short-time Fourier transformation processing on the original audio data containing the bird voiceprint, and calculating a Mel spectrogram or Mel frequency cepstrum coefficient of the original audio data to obtain a voiceprint feature map. Further, in step