Search

CN-122024286-A - Natural protected animal species intelligent recognition method and system integrating closed set target detection and open vocabulary recognition

CN122024286ACN 122024286 ACN122024286 ACN 122024286ACN-122024286-A

Abstract

The invention relates to a natural protected animal species intelligent recognition method and system integrating closed set target detection and open vocabulary recognition, wherein the method firstly utilizes a closed set target detection model to carry out initial detection and judgment on a monitoring image; the method comprises the steps of obtaining a high-quality target area through a prompt-driven segmentation and self-adaptive clipping mechanism for a low-confidence or undetected target, matching the area with a text label set by using an open vocabulary cross-modal recognition model, carrying out collaborative modeling and self-adaptive fusion based on confidence on a closed set detection result and an open vocabulary recognition result, and judging and marking an uncertainty result according to a threshold value. The invention combines the high-precision identification of known species and the coverage of invisible species, improves the reliability and the interpretability of the identification result in a complex environment through multi-source fusion, has good universality and expandability, and is suitable for long-term continuous monitoring in natural protection and other target identification scenes.

Inventors

  • TANG LIYU
  • CHEN ZEYU
  • LING CHEN
  • CHEN QIONG
  • SUN JIEQIONG

Assignees

  • 福州大学

Dates

Publication Date
20260512
Application Date
20260212

Claims (10)

  1. 1. A natural protected animal species intelligent recognition method integrating closed set target detection and open vocabulary recognition is characterized by comprising the following steps: step S1, acquiring image data acquired by monitoring equipment in a natural protection area, and preprocessing the image data; S2, performing target detection on the preprocessed image data by using a closed set target detection model to obtain at least one target candidate region, and outputting a corresponding target category, a target confidence coefficient P det and a target boundary box; Step S3, judging the target candidate region according to the target confidence, and judging the corresponding target candidate region as a trusted target when the target confidence P det reaches or is higher than a preset threshold T det , and triggering an open vocabulary recognition flow when the target confidence P det is lower than T det or a closed set target detection model does not detect the target; S4, performing prompt driving segmentation and target cutting on the target candidate region to obtain a target image region for cross-modal identification, wherein the prompt information comprises a detection frame prompt and/or a semantic text prompt; Step 5, matching the target image area with a preset text label set based on an open vocabulary cross-modal recognition model CLIP to obtain a target recognition result and a corresponding recognition confidence in the meaning of the open vocabulary; And S6, carrying out collaborative modeling and confidence fusion on the closed set target detection result and the open vocabulary recognition result, outputting a final recognition result of the target according to the fusion result, and carrying out uncertainty marking or rollback processing on the target of which the fusion confidence fails to meet the condition.
  2. 2. The method for intelligent recognition of naturally protected animal species with fusion of closed set target detection and open vocabulary recognition according to claim 1, wherein in step S1, the preprocessing includes image size normalization, noise suppression, and format conversion.
  3. 3. The method for intelligently identifying natural protected animal species by fusion of closed set target detection and open vocabulary recognition according to claim 1, wherein the step S4 specifically comprises: S41, constructing a normalized geometric prompt box b according to target boundary box information output by a closed set target detection model; Step S42, inputting the geometric prompt box b and the original image into a prompt driving segmentation model, finely segmenting a target candidate region, and selecting a mask with the largest score from the output candidate masks As a final target segmentation mask; step S43 of dividing the mask according to the target Clipping the original image to generate a target image block only containing a target area; Step S44, when the closed set target detection does not output an effective target boundary box or the target confidence is lower than T det , constructing semantic text prompt information, and inputting an original image and the text prompt information into a prompt driving segmentation model to obtain an animal mask Open segmentation is carried out on the whole image; Step S45, quality evaluation is carried out on the segmentation result based on a mask quality evaluation function q (M), a quality threshold T mask is set, when the segmentation result meets q (M * )≥T mask ), a clipping result based on a segmentation mask is adopted as a target image area, and when the segmentation result meets q (M * )<T mask ), an original image or a clipping result based on a detection frame is returned to be adopted as the target image area.
  4. 4. The method for intelligently identifying animal species in nature protected area by fusion of closed set target detection and open vocabulary recognition according to claim 1, wherein the step S5 specifically comprises: step S51, constructing an open vocabulary candidate text label set Wherein The k fine granularity candidate species are represented, a synonym set S (S) containing a academic name, a common name, a local name and different writing forms is constructed for each species, and synonym entries are filled in a preset prompt template to generate text description; Step S52, utilizing an image encoder and a text encoder of an open vocabulary cross-modal recognition model to respectively perform feature encoding and normalization on a target image area and each text description to obtain an image feature vector v and a text feature vector ; Step S53, calculating the image feature vector v and the feature vector corresponding to each text description For each species, carrying out maximum aggregation on similarity scores obtained by all items in the synonym set S (S) of each species to obtain a fine-grained similarity score (S) of the species; Step S54, carrying out temperature scaling softmax normalization on similarity scores of all species to obtain class probability estimation and corresponding recognition confidence of an open vocabulary side; and S55, constructing a coarse granularity class set C based on the text label set, and obtaining coarse granularity class probability estimation of the open vocabulary side.
  5. 5. The method for intelligent recognition of natural protected animal species by fusion of closed set target detection and open vocabulary recognition according to claim 1, wherein the step S6 specifically comprises: S61, acquiring Top-1 class labels and confidence coefficient thereof output by a closed set target detection model, and mapping the class labels and confidence coefficient thereof to the fine granularity candidate label set S through an alias matching projection function to obtain a closed set evidence score; s62, acquiring the open vocabulary fine granularity probability P fine , the coarse granularity probability P coarse and the coarse granularity optimal category output in the step S5; S63, constructing a coarse granularity consistency gating factor according to consistency of the coarse granularity category of the closed set detection label and the open vocabulary coarse granularity optimal category ; S64, calculating self-adaptive weights according to the open vocabulary fine granularity confidence and the closed set detection confidence by using a smoothing function, and normalizing; S65, constructing a fusion score on a fine grain candidate set S based on the self-adaptive weight, the gating factor, the open vocabulary fine grain probability and the closed set evidence score, taking a species corresponding to the maximum fusion score as a final recognition result, marking the result as 'uncertain' when P fine is smaller than T fine or P coarse is smaller than T coarse according to a preset coarse grain uncertainty threshold T coarse , a fine grain uncertainty threshold T fine and a closed set trusted rollback threshold T yolo , and if P det is larger than or equal to T yolo and a species label corresponding to the maximum fusion score can be mapped to a fine grain species candidate in the set S through an alias, adopting a mapping species as a final recognition result rollback output.
  6. 6. A natural protected animal species intelligent recognition system integrating closed target detection and open vocabulary recognition for implementing the method of any one of claims 1-5 comprising: the data acquisition and preprocessing module is used for acquiring the monitoring data and preprocessing the monitoring data; the closed set target detection module is used for carrying out target detection on the preprocessed image and outputting a target candidate region and a category, a confidence coefficient and a boundary frame thereof; the target candidate region judging module is used for judging the target credibility according to the confidence coefficient, and triggering an open vocabulary recognition flow when the confidence coefficient is lower than a preset threshold value or an effective region is not detected; the prompt driving segmentation and target cutting module is used for segmenting and cutting the target candidate region according to the geometric prompt box or the semantic text prompt to obtain a target image region; The open vocabulary recognition module is used for performing cross-modal matching on the target image area and a preset text label set to obtain an open vocabulary recognition result and a confidence level; the confidence coefficient fusion and uncertainty evaluation module is used for carrying out collaborative modeling and confidence coefficient fusion on the closed set detection result and the open vocabulary recognition result, outputting a final recognition result and carrying out uncertainty judgment; and the identification result output module is used for outputting a final identification result, an uncertainty mark and related confidence information.
  7. 7. The intelligent natural protected animal species recognition system integrating closed-set target detection and open vocabulary recognition of claim 6, wherein the hint-driven segmentation and target clipping module is specifically configured to: When the closed set target detection module outputs an effective boundary frame, building geometric prompt information according to the boundary frame, driving the segmentation model to obtain a fine target mask and cutting; when the closed set target detection module does not output an effective boundary frame or the confidence coefficient is too low, driving the segmentation model to carry out open segmentation on the image according to the semantic text prompt so as to obtain a target mask; The prompt-driven segmentation and target cutting module further comprises a quality evaluation unit for evaluating the quality of the segmentation mask, and when the quality is lower than a preset threshold value, the segmentation mask adaptively returns to adopt an original image or a cutting result based on a detection frame.
  8. 8. The natural protected animal species intelligent recognition system of claim 6, wherein the open vocabulary recognition module comprises: the text label configuration unit is used for constructing and managing an open vocabulary candidate text label set containing coarse-granularity and fine-granularity labels and constructing a synonym set for each label; The cross-modal matching unit is used for mapping the target image area and the text label to a unified semantic space by using a visual-language model, and calculating the image-text similarity; the synonym aggregation unit is used for aggregating similarity scores of different synonym entries of the same species; And the probability normalization unit is used for performing temperature scaling softmax processing on the similarity score to obtain the open vocabulary recognition confidence.
  9. 9. The natural protected animal species intelligent recognition system of claim 6, wherein the confidence fusion and uncertainty assessment module is specifically configured to: Projecting the closed set detection evidence to a fine granularity label space of the open vocabulary; Calculating coarse granularity and fine granularity confidence of an open vocabulary side; Constructing a gating factor according to consistency of the closed set detection tag and open vocabulary coarse-granularity prediction; based on the smoothing function and the confidence value, adaptively calculating fusion weights of the closed set detection path and the open vocabulary recognition path; Calculating a fusion score according to the weight, the gating factor and the evidence of both sides, and determining a final recognition result; And carrying out uncertainty marking on the low-confidence fusion result according to a preset multilevel threshold value, and triggering a rollback mechanism based on the high-confidence closed-set detection result when the condition is met.
  10. 10. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1-5.

Description

Natural protected animal species intelligent recognition method and system integrating closed set target detection and open vocabulary recognition Technical Field The invention relates to the technical field of artificial intelligence and ecological environment monitoring, in particular to a natural protected animal species intelligent identification method and system integrating closed set target detection and open vocabulary identification. The invention is especially suitable for application scenes such as natural protected land, wild animal monitoring, ecological investigation, biodiversity evaluation and the like. Background Natural protected land is an important carrier for biodiversity protection, and wild animal species monitoring is a fundamental work in natural protected land management and ecological research. Currently, in the monitoring and ecological investigation work of wild animals in natural protected areas, the monitoring data are mostly still analyzed by manual inspection, manual interpretation of images or automatic identification systems based on fixed categories. With the large-scale deployment of field protection cameras and unattended monitoring equipment, massive image and video data are naturally generated in the long-term operation process in a protected mode, the manual analysis mode is low in efficiency and high in cost, and the actual requirements of continuous monitoring and quick response are difficult to meet. Existing deep learning-based object detection models typically rely on pre-defined species classes (closed sets) for training. The model has high recognition precision in the category covered by the training set, but when species (unseen species) or rare species which are not in the training set appear in the monitoring scene, effective recognition results cannot be given, missed detection or misjudgment are easy to cause, and the requirement of natural protection on 'new species discovery' in the long-term dynamic monitoring of biodiversity is difficult to meet. In recent years, with the development of artificial intelligence technologies such as visual-language models (e.g., CLIP), multi-modal learning, etc., methods such as open vocabulary recognition and cross-modal semantic matching have been introduced to break through the limitation of the traditional closed-set target detection method that is fixed in category. For example, some prior art techniques introduce semantic information of a vision-language model during the object detection model training phase to promote the adaptation of the model to complex scenarios. Other technologies attempt to expand the target class space and enhance the identification of unknown classes by fusing multi-modal features and combining dynamic knowledge patterns. However, the prior art solutions still have drawbacks when oriented to long-term, open, unattended monitoring scenarios in nature conservation. Some schemes focus on semantic enhancement of the model training phase, but their recognition capabilities are still limited by existing model structures and training data, and it is difficult to effectively identify the species not covered by the training phase. Other schemes focus on expanding a class space through multi-mode fusion, but lack of guarantee mechanism for high-precision identification stability of known common species, if lack of reliability gating mainly based on closed-set high-precision detection, the reduction of the precision and statistical stability of common species identification is easy to cause, and reliable accumulation and trend analysis of long-term continuous monitoring data are affected. Therefore, the prior art generally lacks a comprehensive technical scheme capable of taking closed set target detection as a high-precision recognition main path, fusing open vocabulary recognition capability on the basis, and simultaneously carrying out collaborative modeling, confidence fusion and uncertainty discrimination on a multi-source recognition result. Disclosure of Invention The invention aims to overcome the defects of the prior art, provides a natural protected animal species intelligent recognition method and system integrating closed set target detection and open vocabulary recognition, and the invention performs collaborative modeling and confidence fusion on the closed set detection result and the open vocabulary recognition result, the method has the advantages that the accuracy and the stability of the identification of the known species are remarkably improved, the identification capability of the system to the unseen species and the rare species is effectively enhanced, and the uncertainty of the identification result is effectively judged and marked, so that the intelligent level, the coverage capability and the reliability of the long-term monitoring task in the natural protection area are improved. In order to achieve the purpose, the invention adopts the following technical scheme that the