CN-119724223-B - Acoustic sensor array-based sound source accurate positioning method
Abstract
The invention discloses a sound source accurate positioning method based on an acoustic sensor array, which comprises the following steps of firstly inputting an audio fragment, capturing the audio fragment through an acoustic sensor, secondly manufacturing an audio image, manufacturing an input audio waveform image, thirdly acquiring a characteristic waveform, segmenting the waveform image to intercept a changed sound waveform, fourthly analyzing the waveform image and the input audio fragment, analyzing the intercepted waveform image and the input audio fragment to obtain characteristic information contained in the audio, and fifthly analyzing the characteristic information and analyzing the information characteristics obtained in the fourth step. When the scheme is implemented, the characteristic value is analyzed by analyzing the characteristic in the audio fragment, the characteristic value is anchored, the array sound source is analyzed by the characteristic value, the characteristic value can be rapidly and accurately positioned, the comparison range is further reduced, the comparison and judgment speed is accelerated, and the accuracy of the result can be further improved.
Inventors
- SHI BOLIN
Assignees
- 北京怀芯声学技术有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20241217
Claims (5)
- 1. The sound source accurate positioning method based on the acoustic sensor array is characterized by comprising the following steps, Firstly, inputting an audio fragment, namely capturing the audio fragment through an acoustic sensor; step two, manufacturing an audio image, namely manufacturing a waveform image of the input audio; Step three, obtaining characteristic waveforms, namely cutting the waveform image to intercept changed sound waveforms; analyzing the intercepted waveform image and the input audio fragment to obtain characteristic information contained in the audio; Analyzing the information characteristics obtained in the fourth step, and comparing the obtained characteristic information with an array sound source; Step six, judging the result that if the array sound source does not contain the characteristic information of the current audio fragment, directly outputting the input audio fragment, and if the array sound source contains the input audio fragment, when outputting the input audio fragment, deriving the position of the audio fragment in the array sound source; Step seven, outputting a result; when an audio image is manufactured, the audio image is manufactured according to audio decibels, and the specific manufacturing method is as follows: establishing a two-dimensional coordinate system, wherein the horizontal axis is time, the vertical axis is decibel, obtaining a waveform image according to the change of the decibel in the audio segment along with the time, marking a plurality of audio time periods with the highest decibel in the audio segment, and recording the audio time periods as audio segments; in the third step, the characteristic waveform is segmented from the starting point of the characteristic waveform to the ending point of the characteristic waveform according to the marked audio segment; In the fourth step, the waveform is analyzed and divided into content characteristic information analysis and time characteristic analysis; the content characteristic information analysis is that according to the waveform image which is cut down, the audio content is analyzed, the type of the audio content is judged, the audio is classified, then the content of the classified audio is analyzed and interpreted, and the content characteristic information in the waveform which is cut down is determined; The time characteristic information analysis is to analyze the input audio fragment, judge whether the input audio fragment contains time characteristic information, if so, judge the position of the input audio file in the array sound source according to the time characteristic information, if not, then no analysis is carried out on the time characteristic information.
- 2. The method for precisely positioning sound source based on acoustic sensor array according to claim 1, wherein when the content information is characterized by voice dialogue, there is Extracting voice dialogue content, judging the context containing the place name and/or the person name and/or the thing if the place name and/or the person name and/or the thing appear, determining the time point of the input audio fragment before, after or after the place name and/or the person name and/or the thing appear, obtaining a content conclusion one, Processing the array sound source, dividing before or after the place name and/or the person name and/or the thing, And comparing the input audio fragment with the segmented array sound source according to the content conclusion one to determine the accurate position of the input audio fragment in the array sound source.
- 3. The method for precisely positioning sound source based on acoustic sensor array according to claim 1, wherein when the content information is characterized by no voice conversation, there is Extracting the content characteristics in the segmented waveform, judging whether the extracted content is animal sound or object sound or natural sound, obtaining a content conclusion II, Processing the array sound source, dividing the array sound source according to the time point of the occurrence of animal sound, object sound or natural sound, And comparing the input audio fragment with the segmented array sound source according to the second content conclusion to determine the accurate position of the input audio fragment in the array sound source.
- 4. The method for precisely positioning sound sources based on acoustic sensor array according to claim 1, wherein when the content information features include both voice dialogue information and animal sounds and/or object sounds and/or natural sounds, then there are Extracting the content characteristics in the segmented waveform, judging the extracted content to be animal sound and/or object sound and/or natural sound and/or place name and/or person name and/or thing, obtaining a content conclusion III, Processing the array sound source, dividing the array sound source according to the occurrence time points of animal sound or object sound or natural sound and/or place names and/or person names and/or things, And according to the third content conclusion, comparing the input audio fragment with the segmented array sound source to determine the accurate position of the input audio fragment in the array sound source.
- 5. The method for accurately positioning sound sources based on an acoustic sensor array according to claim 2, 3 or 4, wherein in the time characteristic information analysis, the input audio clips are analyzed to determine whether the input audio clips contain direct time information and/or indirect time information, When the direct time characteristic information is contained, the array sound source is segmented according to the direct time information and is compared with the input audio fragments to determine the accurate position of the input audio fragments in the array sound source, When the indirect time characteristic information is contained, the indirect time characteristic information is firstly analyzed to be converted into direct time characteristic information, and then the input audio fragment is compared with the array sound source according to the direct time characteristic information.
Description
Acoustic sensor array-based sound source accurate positioning method Technical Field The invention relates to the technical field of computers, in particular to an acoustic sensor array-based sound source accurate positioning method. Background In processing an audio file, sometimes the audio file is too bulky, and in some scenarios, in order to determine whether a certain sound is part of an array sound source (the whole audio file), the following method is generally adopted in the prior art: By manual means, after listening to the audio clips, listening to the array audio is performed manually to determine whether the audio clips belong to the array audio file and at a specific location of the audio file, which wastes a lot of time, The method for detecting the audio clip and the method of the technical scheme disclosed in the related equipment are disclosed in application numbers such as CN201911399043.0, but the accuracy is lower when the method is implemented. Based on the above, the application provides a sound source accurate positioning method based on an acoustic sensor array. Disclosure of Invention Therefore, the invention provides an accurate positioning method based on an acoustic sensor array sound source so as to solve the problem of how to rapidly and accurately judge the position of an audio fragment in the array sound source. In order to achieve the above object, the present invention provides the following technical solutions: according to a first aspect of the invention, a sound source accurate positioning method based on an acoustic sensor array comprises the following steps, Step one, inputting an audio fragment, capturing the audio fragment by an acoustic sensor, Step two, making audio image, making waveform image of input audio, Step three, obtaining characteristic waveforms, cutting waveform images to intercept changed sound waveforms, Analyzing the waveform, analyzing the intercepted waveform image and the input audio fragment to obtain the characteristic information contained in the audio, Analyzing the characteristic information, analyzing the information characteristics obtained in the step four, comparing the obtained characteristic information with the array sound source, Step six, if the array sound source has no corresponding audio frequency fragment, directly outputting the input audio frequency fragment, if the array sound source contains the input audio frequency fragment, when outputting the input audio frequency fragment, deriving the position of the audio frequency fragment in the array sound source, And step seven, outputting a result. Preferably, when the audio image is manufactured, the audio image is manufactured according to the audio decibels, and the specific manufacturing method is as follows: Establishing a two-dimensional coordinate system, wherein the horizontal axis is time, the vertical axis is decibel, obtaining a waveform image according to the change of the decibel in the audio frequency segment along with the time, and marking a plurality of audio frequency time segments with the highest decibel in the audio frequency segment as audio frequency segments. Preferably, in step three, the signature is sliced from the starting point of the signature to the ending point of the signature according to the audio segment that has been marked. Preferably, the analysis of the waveform is divided into content characteristic information analysis and time characteristic analysis, The content characteristic information analysis is that according to the waveform image which is cut down, the audio content is analyzed, the type of the audio content is judged, the audio is classified, then the content of the classified audio is analyzed and interpreted, and the content characteristic information in the waveform which is cut down is determined; The time characteristic information analysis is to analyze the input audio fragment, judge whether the input audio fragment contains time characteristic information, if so, judge the position of the input audio file in the array sound source according to the time characteristic information, if not, then no analysis is carried out on the time characteristic information. Preferably, when the content information is characterized as a voice conversation, then there is Extracting voice dialogue content, judging the context containing the place name and/or the person name and/or the thing if the place name and/or the person name and/or the thing appear, determining the time point of the input audio fragment before, after or after the place name and/or the person name and/or the thing appear, obtaining a content conclusion one, Processing the array sound source, dividing before or after the place name and/or the person name and/or the thing, And comparing the input audio fragment with the segmented array sound source according to the content conclusion one to determine the accurate position of the input audio fragment in the array sound so