CN-117153167-B - Method for improving bark recognition rate of dogs

CN117153167BCN 117153167 BCN117153167 BCN 117153167BCN-117153167-B

Abstract

The application provides a method for improving the bark recognition rate of a dog, which comprises the steps of S1, collecting bark audio, increasing the affinity with recognition equipment, S2, long-time silence elimination, S3, enhancing audio generalization, S4, adding a caching strategy of audio real-time stream, S5, filtering an approximate sound-free section audio stream based on short-time energy, and S6, setting a sliding window mechanism of the audio stream. The method of the application aims at the identification of the bark audio by preprocessing the data and judging the strategy mechanism together, skillfully analyzes the unified characteristic affecting the identification of the bark audio, and further improves the identification rate and false detection rate of the bark audio under the condition of not affecting the original detection target recall rate and accuracy.

Inventors

KONG DEPING

Assignees

北京君正集成电路股份有限公司

Dates

Publication Date: 20260508
Application Date: 20220524

Claims (9)

1. A method of improving bark identification rate of a dog, said method comprising the steps of: S1, collecting bark audio, and increasing the affinity with identification equipment: Collecting audio data, wherein the bark is a positive sample and the background noise is a negative sample; the method that the equipment end transcribes the voice frequency of the bark positive sample is adopted, the original voice frequency of the bark and the transcribed voice frequency of the bark are simultaneously used as training samples, and the strong relevance between the voice frequency and the identification detection equipment is enhanced; S2, long-time mute elimination: The method comprises the steps that audio streams with the mute ratio larger than the voice of the bark of the dog or with the total voice length of the bark of the dog being far smaller than the mute length are eliminated in a voice frequency sample of the bark of the dog, namely, the mute length in each voice frequency detected by a model is equal to or less than 1:4, so that the voice frequency with the large mute ratio is prevented from being used as a positive sample to enter the model for training; s3, enhancing audio generalization: The volume, time shift, pitch and speed of the random scaling audio stream are increased at the audio preprocessing end, namely, before the audio enters the network training, random seed variables are set in advance, so that part of the audio is preprocessed randomly, and the barking of the dog in a real life scene is simulated to a greater extent, so that the device for recognizing the barking of the dog is not influenced by the distance between equipment for recognizing the barking of the dog and the barking of the dog in audio recognition; s4, adding a caching strategy of the audio real-time stream: Applying a memory space as a buffer space of a real-time audio stream to be detected, if the audio time length entering an identification model is X, and the equipment identifies that the audio execution time is Y, when no other AI application exists, the buffer space size needs to meet the requirement that the audio stream with the time length of n (X+Y) can be stored maximally, wherein n is an integer, 1,2,3 are taken, and the real-time audio stream is buffered to the space, so that the integrity of the real-time audio stream entering the model detection can be ensured by the combined action of the later period and a sliding window mechanism conveniently; s5, filtering the approximate silence period audio stream based on short-time energy: If some part of audio short-time energy approaches zero, the part of audio can be considered to be approximate to a silent section, so that the situation that the audio stream of the approximate silent section triggers the false recognition of the barking of the dog in the actual life can be reduced; s6, setting a sliding window mechanism of the audio stream: And (3) adding a sliding window mechanism under the effect of saving the integrity of the real-time audio stream in the step S4, so that when the detection and the identification of the audio segments are entered, the audio streams of the barking of the dogs in the adjacent two segments of audio can be completely detected, namely, the audio streams which are leaked in the time spent by the equipment identification between the audio which is not added with the sliding window strategy and the audio which is identified twice in the mechanism are detected.
2. The method as set forth in claim 1, wherein in step S1, for learning different barking sounds, the collecting is performed in all directions, and the collecting is performed taking care that the sample audio with the background noise volume larger than the barking sounds is removed, or the barking sounds and the background noise are separated by using the short-time energy or the short-time average amplitude of the input signal for the audio with high signal noise.
3. The method as set forth in claim 1, wherein the step S2 is to prevent the voice frequency with a large mute ratio from being used as a positive sample, because the voice frequency of the bark is eliminated, and the voice frequency of the bark is analyzed, and the voice frequency of the bark is different between two consecutive intervals bark furiously, and the interval is mute, and if the interval is just used as a positive sample, the voice frequency of the bark is prevented from being used as a positive sample, the voice frequency of the bark is eliminated, and the voice frequency of the bark is eliminated for a long time.
4. The method as set forth in claim 1, wherein in the step S3, the differences of the barking sounds are not only represented by the differences of the kinds but also by the differences of the environment and the volume, and the barking sounds of different kinds, ages and environments can be collected manually.
5. The method as claimed in claim 1, wherein in the step S4, it should be further understood whether other AI applications exist on the identifying device, if other AI applications exist, it should be considered that the consumption of the identifying time of the bark by the dog is caused after the other AI applications are turned on, and when the plurality of AI functions together, the time consumed by the bark audio in the data preprocessing stage and the detecting and identifying stage is increased due to the influence of the detection of other audio, so that the setting of the buffering mechanism is closely related to the AI applications, and if the time consumption is increased to Z, the buffering space in the set buffering policy should satisfy n×x+y+z.
6. The method of claim 1 wherein in step S5 the threshold is set to a short time energy of less than 10 -2 .
7. The method of claim 1 wherein in step S5, the difference between the voice and background noise is represented by audio energy, the energy of the voice segment is greater than the background noise energy, and the energy of the voice segment is the sum of the energy of the noise segment and the energy of the voice sound energy.
8. The method as set forth in claim 1, wherein in step S6, the sliding window mechanism is used to add continuous complete barking sounds to the detection recognition, and the detection recognition effect of the barking sounds is increased by increasing the number of detection times.
9. The method of claim 1, wherein in step S1, the method of transcribing the positive sample audio of the dog by using the equipment end is to write an equipment transcription algorithm, control noise reduction parameters of real-time audio stream and parameters for adjusting volume in an algorithm script, run the transcription script on the transcription equipment end, and the audio playing source can use a mobile phone or a computer at will, so that the original voice can be generated into transcribed audio by the transcription algorithm, so that the voice stream of the dog entering the identification equipment is relatively clear, thereby being beneficial to identifying the dog by barking.

Description

Method for improving bark recognition rate of dogs Technical Field The invention relates to the technical field of intelligent audio processing, in particular to a method for improving the bark identification rate of a dog. Background Both hearing and vision are two important sources for human to acquire external information, and the lack of one of them can cause that human cannot accurately judge the change of external environment. Such as blind people, dim sight, shielding, and the like, and also can know the change of different geographic positions. For the above reasons, people want to use the existing computer technology to perform image recognition and voice recognition, assist in obtaining external information that cannot be directly obtained, and replace part of manpower by the computer technology, so as to improve the working efficiency. Today, the development of vision, i.e. image recognition, is increasingly mature, but speech recognition is slow. Image recognition often loses its original function due to the characteristic that light needs to travel in a straight line, and voice plays a role because of its characteristic that it can bypass obstacle propagation, so that the aim of replacing image recognition is achieved. How to identify a certain target class from a noisy environment so that useful information can be obtained with the target audio class is of increasing interest to researchers. The method aims at improving the identification rate of barking of the dogs, is derived from a pet dog feeder, and can be used for identifying whether the dogs barking to intelligently feed the dogs, and can be expanded into the identification and detection of any audio frequency. The existing audio recognition algorithm flow generally comprises the steps of audio preprocessing, feature extraction, training and generating an audio template library, recognition, and adding training samples aiming at false or missed audio in the later period, so as to iterate out an audio recognition model for reducing false detection and improving positive detection, wherein part of researchers build a complex network to learn target audio types, but the detection time corresponding to the complex network structure is often prolonged, namely the sensitivity of audio recognition is lagged. When training samples are added, the existing audio recognition method is used for adding audio frequency of missing detection and false detection in batches, but the real-time entering recognition equipment has very different recognition effects on the same dog bark due to different play sources and recording sources, so that how to improve the quality of the training audio frequency while improving the quantity of the training audio frequency, align the equipment, train the best model and align the real-time audio frequency on a code end and an equipment recognition end is a problem to be solved urgently by researchers. However, how to improve the recognition rate of the target audio without background noise and with different kinds of background noise also gives researchers much effort to think in audio recognition. Disclosure of Invention In order to solve the problems in the prior art, the application aims to provide a method for solving the problems of low bark identification rate and high false alarm rate of dogs, namely a set of method strategies for improving the identification rate in target audio detection. The method for improving the bark recognition rate of the dog is mainly focused on improving the quality of bark audio of the dog, so that a model obtained through learning is more targeted and is more suitable for a target environment. And a judgment strategy and an optimization mechanism are added in the early stage of bark audio identification, so that positive detection can be improved, false detection can be reduced, and the method has higher practicability and universality in other types of audio identification fields. In particular, the present invention provides a method of improving the rate of bark identification by a dog, the method comprising the steps of: S1, collecting bark audio, and increasing the affinity with identification equipment: Collecting audio data, wherein the bark is a positive sample and the background noise is a negative sample; the method that the equipment end transcribes the voice frequency of the bark positive sample is adopted, the original voice frequency of the bark and the transcribed voice frequency of the bark are simultaneously used as training samples, and the strong relevance between the voice frequency and the identification detection equipment is enhanced; S2, long-time mute elimination: The method comprises the steps that audio streams with the mute ratio larger than the voice of the bark of the dog or with the total voice length of the bark of the dog being far smaller than the mute length are eliminated in a voice frequency sample of the bark of the dog, namely, the mute length