CN-121725794-B - Method and system for identifying and positioning animals based on sound event

CN121725794BCN 121725794 BCN121725794 BCN 121725794BCN-121725794-B

Abstract

The invention discloses an animal identification and positioning method and system based on sound events, which relate to the technical field of bioacoustic monitoring and comprise the steps of collecting multiple paths of original audio data to form a target sound event set. And (3) performing animal type identification processing on the collection, extracting and fusing acoustic features of each path of audio, and generating animal type judgment results. Meanwhile, sound source space localization processing is performed, and an estimated space position of the animal sound source is calculated based on the arrival time difference and the intensity difference of the sound wave. And carrying out data fusion and correlation on the type judgment result and the estimated position to generate an animal activity record containing the animal type and the position. A spatiotemporal continuity analysis is performed on the record, matching in association with the historical animal activity record, correcting or updating the location or category information in the record. And finally, generating an animal activity thermodynamic diagram or an animal track diagram of the monitoring area according to the animal activity record. The method improves the accuracy and data continuity of animal monitoring in complex environments.

Inventors

LIU TONG
WANG HUI
ZHANG KANGKANG

Assignees

吉林农业大学

Dates

Publication Date: 20260508
Application Date: 20260225

Claims (9)

1. A method for identifying and locating animals based on sound events, said method comprising: collecting multiple paths of original audio data to form a target sound event set, including: Receiving the synchronously collected original audio data streams from a plurality of sound collection nodes deployed in a monitoring area to obtain multiple paths of original audio data, and performing environmental noise suppression and time-frequency domain enhancement processing on the multiple paths of original audio data to generate multiple paths of enhanced audio data; Performing sound event segmentation and feature extraction processing on the multipath enhanced audio data to obtain a plurality of sound event fragments and corresponding acoustic feature vector sets thereof; performing cross-channel correlation analysis and event matching on the sound event fragments to obtain a plurality of sound event fragment sets associated with the same animal sound event, and forming a target sound event set, wherein the cross-channel correlation analysis and event matching comprises the following steps: Calculating the similarity of acoustic features between any two sound event fragments from different sound collection nodes and the time overlapping degree, wherein the similarity of the acoustic features is calculated based on the cosine distance between the feature vectors of the mel frequency cepstrum coefficients of the sound event fragments, and the time overlapping degree is calculated according to the ratio of the intersection of the starting time and the ending time of the two sound event fragments to the union; According to the similarity of the acoustic features and the time overlapping degree, calculating the comprehensive association probability that two sound event fragments belong to the same animal sound event; the calculation of the comprehensive association probability is expressed as follows: ; Wherein: Representing the probability of the composite association, Representing the normalized similarity of the acoustic features, Indicating the degree of normalized time overlap and, And Is a preset weight coefficient and meets ; Setting a comprehensive association probability threshold, merging two sound event fragments with the comprehensive association probability exceeding the comprehensive association probability threshold, and forming an initial sound event cluster; expanding each initial sound event cluster, and merging new sound event fragments with the probability of being comprehensively associated with any fragment in the initial sound event cluster exceeding a threshold value into the initial sound event cluster; Determining a sound event fragment set from different sound collection nodes contained in each initial sound event cluster as a target sound event set; Executing animal type identification processing on the target sound event set, extracting and fusing acoustic characteristics of each path of audio data in the target sound event set, and generating animal type judgment results of sounding events; performing sound source space positioning processing on the target sound event set, and calculating the estimated space position of the animal sound source in the sound event based on the sound wave arrival time difference and the sound wave intensity difference information among all paths of audio data; Performing data fusion and association on the animal type judgment result and the estimated spatial position to generate an animal activity record containing animal types and corresponding positions; Performing space-time continuity analysis on the animal activity records, performing association matching with historical animal activity records, and correcting or updating position or type information in the animal activity records; And generating an animal activity thermodynamic diagram or an animal track diagram of the monitoring area according to the animal activity record.
2. The method of claim 1, wherein performing the environmental noise suppression and time-frequency domain enhancement processing on the plurality of channels of raw audio data to generate a plurality of channels of enhanced audio data comprises: For each path of original audio data, filtering environmental background noise by adopting an adaptive filter to obtain preliminary noise reduction audio data; Performing short-time Fourier transform on the preliminary noise reduction audio data to obtain a time-frequency spectrogram; Applying a mask-based sound source separation algorithm to the time-frequency spectrogram to further separate foreground animal sound components and residual background noise components; enhancing the energy distribution of the sound components of the foreground animals in the time-frequency spectrogram to obtain an enhanced time-frequency spectrogram; and performing inverse short-time Fourier transform on the enhanced time-frequency spectrogram, and reconstructing to obtain single-channel enhanced audio data.
3. The method for recognizing and locating animals based on sound events according to claim 2, wherein the step of performing sound event segmentation and feature extraction processing on the multi-channel enhanced audio data to obtain a plurality of sound event fragments and corresponding acoustic feature vector sets thereof comprises: performing end point detection on each path of enhanced audio data, and identifying a time interval containing effective sound energy to obtain candidate sound fragments; Calculating the Mel frequency cepstrum coefficient, the linear prediction coefficient and the spectrum centroid feature of each candidate sound fragment to obtain an initial acoustic feature; Inputting the initial acoustic features into a pre-trained sound event classification model, and judging whether the candidate sound fragments belong to animal sound events or not; if the sound event belongs to the animal sound event, marking the candidate sound fragment as a sound event fragment, extracting high-dimensional depth acoustic features of the sound event fragment, and splicing the high-dimensional depth acoustic features with the initial acoustic features to obtain acoustic feature vectors of the sound event fragment; And collecting sound event fragments corresponding to all animal sound events in each path of enhanced audio data and acoustic feature vectors of the sound event fragments.
4. The method for recognizing and locating animals based on sound event according to claim 1, wherein the step of executing animal species recognition processing on the target sound event set, extracting and fusing acoustic features of each path of audio data in the target sound event set, and generating animal species determination results of sound event comprises: extracting acoustic feature vectors of sound event fragments from different sound collection nodes from the target sound event set; Inputting the acoustic feature vectors from different nodes into a feature fusion network, and calculating the weight of each node feature through an attention mechanism; weighting and fusing acoustic feature vectors from different nodes according to the weights to obtain fused global acoustic feature vectors; Inputting the global acoustic feature vector into a pre-trained animal species classification model; Outputting probability distribution of animal sound events belonging to each known animal species by the animal species classification model; and using the animal type with the highest probability as the animal type judgment result.
5. The method for recognizing and locating animals based on sound events according to claim 1, wherein the performing sound source space locating processing on the target sound event set, calculating an estimated space position of the animal sound source in the sound event based on the sound wave arrival time difference and the sound wave intensity difference information between the paths of the audio data, comprises: Selecting at least three sound event fragments with the best signal quality from the target sound event set, and respectively extracting geographic position coordinates of corresponding sound acquisition nodes; calculating relative arrival time differences between the at least three sound event pieces; Establishing a hyperbolic equation set about the sound source position according to the propagation speed of sound in the air and the relative arrival time difference; meanwhile, calculating the sound wave intensity difference among the at least three sound event fragments, and establishing a distance attenuation equation set about the sound source position by combining a sound propagation attenuation model; And solving the hyperbolic equation set and the distance attenuation equation set simultaneously to obtain the estimated space position coordinate of the animal sound source.
6. The method of claim 1, wherein the data fusion and correlation of the animal species determination result and the estimated spatial location to generate an animal activity record including the animal species and its corresponding location comprises: Binding the animal type judgment result with the estimated space position coordinate to generate an initial activity record, wherein the initial activity record comprises an animal type identifier, a position coordinate and a time stamp of a sounding event; comparing the initial activity record with an animal characteristic knowledge base, and acquiring typical activity range and moving speed information of animals according to animal type identification; Carrying out rationality verification on the position coordinate by utilizing the typical movable range and the moving speed information, and carrying out smoothing correction based on historical data on the position coordinate if the position coordinate exceeds the typical movable range threshold; integrating the verified or corrected animal species identification, the position coordinates and the time stamp to generate a final animal activity record.
7. The method for identifying and locating animals based on sound events according to claim 6, wherein said performing a space-time continuity analysis on said animal activity records, performing an associative matching with a historical animal activity record, correcting or updating location or kind information in said animal activity record, comprises: Retrieving from the database a historical animal activity record adjacent in time to the current animal activity record, geographically proximate to the location coordinates of the current animal activity record; calculating the comprehensive matching degree of the current animal activity record and each historical animal activity record in the aspects of species consistency, position movement continuity and time interval rationality; if the historical animal activity records with the comprehensive matching degree exceeding the preset threshold exist, judging the current animal activity record and the historical animal activity record as continuous activities of the same animal; carrying out Kalman filtering smoothing processing on position coordinates in the current animal activity record according to the track trend in the historical animal activity record so as to correct positioning errors; and if the animal type judgment result of the current animal activity record is inconsistent with the animal type of the related historical animal activity record, starting a type re-judgment flow, and carrying out comprehensive judgment by combining the historical data.
8. The method of claim 7, wherein generating an animal activity thermodynamic diagram or an animal trajectory diagram of a monitored area from the animal activity records comprises: dividing the monitoring area into uniform grids; counting the number of animal activity records falling into each grid in a preset time period to obtain grid activity frequency; According to the grid activity frequency, rendering and generating an animal activity thermodynamic diagram of the monitoring area in a color mapping mode; or selecting an activity record of a specific animal individual, and connecting the position coordinates of the activity record according to a time sequence to obtain a movement track of the specific animal individual; And superposing and displaying the moving tracks of a plurality of animal individuals on the same map base map to generate an animal track map.
9. An acoustic event based animal identification and localization system comprising a processor and a memory, the memory storing a computer program, the processor being configured to implement the acoustic event based animal identification and localization method of any one of claims 1 to 8 when the computer program is executed.

Description

Method and system for identifying and positioning animals based on sound event Technical Field The invention belongs to the technical field of bioacoustic monitoring, and particularly relates to an animal identification and positioning method and system based on sound events. Background In sound-based field animal monitoring, prior art solutions typically employ a separate audio acquisition unit and subsequent analysis module. The species identification link depends on an audio signal obtained from a single acquisition point, and the judgment is completed by extracting characteristic parameters such as a Mel frequency cepstrum coefficient of the signal and the like and inputting the characteristic parameters into a classification model. Sound source localization is based on a plurality of arranged acquisition points, and the time difference or energy difference of sound reaching different sensors is used for estimating sound source coordinates. These links are typically processed serially in a fixed flow, outputting independent time-stamped species tags and location point data. The quality and stability of acoustic features obtained from a single acquisition channel is severely dependent on the specific acoustic environment of the point. In the actual scene with continuous environmental noise, paroxysmal interference or acoustic multipath propagation, the single-path characteristics are easy to distort or lose key information, so that the distinguishing capability of the identification model on similar species is reduced, and the reliability of the overall identification result is difficult to ensure under complex conditions. The current technical solution is isolated and instantaneous for each sound event, and there is no correlation between the calculated position information. The system cannot judge whether the sequential events belong to continuous activities of the same animal, and it is difficult to carry out logic verification and smooth correction on an abnormal result possibly generated by signal shielding or calculation errors by utilizing historical data. This results in two major problems, namely species identification accuracy is limited by single point signals, unstable performance in complex acoustic environments, output animal activity information is a collection of discrete points, consistent and reliable space-time trajectories cannot be formed, and the value of the animal activity information for deep behavior analysis and long-term ecological research is limited. A method is needed that can comprehensively utilize multiple paths of audio information to improve feature robustness, and that performs collaborative optimization and continuity correction on recognition and positioning results by establishing spatial-temporal correlations between events. Disclosure of Invention The present invention aims to solve at least one of the technical problems existing in the prior art; to this end, the invention proposes an animal identification and localization method based on sound events, comprising: Collecting multiple paths of original audio data to form a target sound event set; Executing animal type identification processing on the target sound event set, extracting and fusing acoustic characteristics of each path of audio data in the target sound event set, and generating animal type judgment results of sounding events; performing sound source space positioning processing on the target sound event set, and calculating the estimated space position of the animal sound source in the sound event based on the sound wave arrival time difference and the sound wave intensity difference information among all paths of audio data; Performing data fusion and association on the animal type judgment result and the estimated spatial position to generate an animal activity record containing animal types and corresponding positions; Performing space-time continuity analysis on the animal activity records, performing association matching with historical animal activity records, and correcting or updating position or type information in the animal activity records; And generating an animal activity thermodynamic diagram or an animal track diagram of the monitoring area according to the animal activity record. Further, collecting multiple paths of original audio data to form a target sound event set includes: Receiving the synchronously collected original audio data streams from a plurality of sound collection nodes deployed in a monitoring area to obtain multiple paths of original audio data, and performing environmental noise suppression and time-frequency domain enhancement processing on the multiple paths of original audio data to generate multiple paths of enhanced audio data; Performing sound event segmentation and feature extraction processing on the multipath enhanced audio data to obtain a plurality of sound event fragments and corresponding acoustic feature vector sets thereof; Cross-channel correlation