CN-121983085-A - Snore extracting method, storage medium and earphone

CN121983085ACN 121983085 ACN121983085 ACN 121983085ACN-121983085-A

Abstract

The invention discloses a snore extraction method, a storage medium and an earphone, which comprise the steps of obtaining a stored environment audio signal, a gesture signal and a heart rate signal, extracting noise characteristics, correlation characteristics and heart rate characteristics from the environment audio signal, the gesture signal and the heart rate signal, fusing the noise characteristics, the correlation characteristics and the heart rate characteristics to obtain a fused characteristic sequence, obtaining a plurality of corresponding frame-level probabilities according to the fused characteristic sequence and a preset snore identification model, and extracting an audio signal of a target snore according to the frame-level probabilities and the stored environment audio signal. Compared with the prior art, the method can realize the recognition and extraction of the target snore in the complex snore scene by utilizing the multi-mode integration and frame analysis of the noise characteristics, the correlation characteristics and the heart rate characteristics, thereby improving the accuracy of the audio extraction of the target snore and further improving the accuracy of the subsequent sleep quality analysis.

Inventors

LIU XIMIN
DU HAIQUAN
LI WEIXIONG
YU SHICHENG

Assignees

江西瑞声电子有限公司

Dates

Publication Date: 20260505
Application Date: 20251231

Claims (10)

1. A method of snoring extraction comprising: acquiring a stored environmental audio signal, a stored gesture signal and a stored heart rate signal; extracting noise characteristics, correlation characteristics and heart rate characteristics from the environmental audio signals, the gesture signals and the heart rate signals; fusing the noise characteristics, the correlation characteristics and the heart rate characteristics to obtain a fused characteristic sequence; Obtaining a plurality of corresponding frame-level probabilities according to the fusion characteristic sequence and a preset snore identification model; and extracting the audio signal of the target snore according to the frame-level probability and the stored environment audio signal.
2. The method of claim 1, wherein the step of extracting noise features, correlation features and heart rate features from the environmental audio signal, the gesture signal and the heart rate signal comprises: acquiring a vibration signal from the attitude signal; based on the time stamp, synchronously aligning and framing the environmental audio signal, the vibration signal and the heart rate signal; extracting noise characteristics and heart rate characteristics from the environmental audio signal and the heart rate signal respectively; and obtaining a correlation characteristic according to the environmental audio signal and the vibration signal.
3. The method of snore extraction as claimed in claim 2, wherein the step of deriving a correlation feature from the ambient audio signal and the vibration signal comprises: Based on the environmental audio signal and the vibration signal of each frame, obtaining corresponding audio self-power spectrum, vibration self-power spectrum and mutual power spectrum; obtaining a correlation average value as a frame level sub-feature according to the audio self-power spectrum, the vibration self-power spectrum and the mutual power spectrum of each frame; and based on the time sequence, arranging the frame-level sub-features to obtain correlation features.
4. The method of claim 1, wherein the step of fusing the noise features, correlation features and heart rate features to obtain a fused feature sequence comprises: fusing noise characteristics, correlation characteristics and heart rate characteristics of the same frame to obtain target characteristics; And based on the time sequence, arranging the target features to obtain a fusion feature sequence.
5. The method of claim 1, wherein the step of obtaining a plurality of corresponding frame-level probabilities according to the fusion feature sequence and a preset snore identification model comprises: inputting the fusion characteristic sequence into a preset snore identification model; analyzing the fusion characteristic sequence based on preset recognition logic, and outputting a confidence value corresponding to each frame; And mapping the confidence value and outputting the corresponding frame-level probability.
6. The method of claim 1, wherein the step of extracting the audio signal of the target snore based on the frame-level probability and the stored ambient audio signal comprises: Based on a preset judging threshold, judging whether the frame-level probability triggers audio extraction or not; If yes, extracting corresponding fragments in the environment audio signal to obtain marked fragments; filtering the marked fragments based on a preset duration threshold to obtain target fragments; and splicing the target segments based on the time sequence to obtain the audio signal of the target snore.
7. The method of claim 6, wherein the decision threshold comprises a first threshold and a second threshold, and wherein the step of deciding whether the frame level probability triggers audio extraction based on a preset decision threshold comprises: determining whether the frame level probability is less than the first threshold; if not, starting audio extraction from the corresponding frame in the environment audio signal; Determining whether the frame level probability is greater than the second threshold; if not, stopping audio extraction from the corresponding frame in the ambient audio signal.
8. The method of claim 6, wherein the step of filtering the marker segment based on a predetermined duration threshold to obtain a target segment comprises: Acquiring the duration of the marked fragments; Judging whether the duration is smaller than a preset duration threshold value or not; if not, defining the marked fragment as a target fragment.
9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method of snoring extraction as claimed in any one of claims 1 to 8.
10. A headset comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, carries out the steps of the method of snoring extraction according to any one of claims 1 to 8.

Description

Snore extracting method, storage medium and earphone Technical Field The invention relates to the technical field of headphones, in particular to a snore extracting method, a storage medium and headphones. Background With the continuous development of electronic technology, users have increasingly demanded functions of headphones. Existing headphones have developed health detection functions, such as heart rate monitoring, sleep quality analysis, and the like, and for headphones configured with sleep quality analysis, snore sounding conditions during sleeping of a user are generally monitored. In the process of using the earphone, a user may have a scene of a snore sound source, and when the existing earphone is used for coping with the scene of the snore sound source, the required target snore sound can not be extracted from the collected environment audio signal relatively accurately, so that the accuracy of subsequent sleep quality analysis is reduced. In view of the above, it is necessary to provide a method, a storage medium and an earphone for extracting snore to solve the above-mentioned problems. Disclosure of Invention In view of the defects existing in the prior art, the invention provides a snore extracting method, a storage medium and a pair of headphones, which can effectively solve the problems that the accuracy of the headphones for extracting target snore audio is low in a multi-snore sound source scene, and the accuracy of sleep quality analysis is affected. To achieve the above object, a first aspect of the present invention provides a method for extracting snore, comprising the steps of: acquiring a stored environmental audio signal, a stored gesture signal and a stored heart rate signal; extracting noise characteristics, correlation characteristics and heart rate characteristics from an environmental audio signal, an attitude signal and a heart rate signal; Fusing the noise characteristic, the correlation characteristic and the heart rate characteristic to obtain a fused characteristic sequence; Obtaining a plurality of corresponding frame-level probabilities according to the fusion characteristic sequence and a preset snore identification model; And extracting the audio signal of the target snore according to the frame-level probability and the stored environment audio signal. In one embodiment, the step of extracting noise features, correlation features and heart rate features from the environmental audio signal, the gesture signal and the heart rate signal comprises: acquiring a vibration signal from the attitude signal; based on the time stamps, synchronously aligning and framing the ambient audio signal, the vibration signal, and the heart rate signal; extracting noise characteristics and heart rate characteristics from an environmental audio signal and a heart rate signal respectively; and obtaining the correlation characteristic according to the environmental audio signal and the vibration signal. In one embodiment, the step of obtaining the correlation feature from the environmental audio signal and the vibration signal comprises: Based on the environmental audio signals and the vibration signals of each frame, corresponding audio self-power spectrum, vibration self-power spectrum and mutual power spectrum are obtained; Obtaining a correlation average value as a frame level sub-feature according to the audio self-power spectrum, the vibration self-power spectrum and the mutual power spectrum of each frame; Based on the time sequence, the frame-level sub-features are arranged to obtain correlation features. In one embodiment, the step of fusing the noise feature, the correlation feature, and the heart rate feature to obtain a fused feature sequence comprises: fusing noise characteristics, correlation characteristics and heart rate characteristics of the same frame to obtain target characteristics; and arranging target features based on the time sequence to obtain a fusion feature sequence. In one embodiment, the step of obtaining a plurality of corresponding frame-level probabilities according to the fusion feature sequence and the preset snore recognition model includes: Inputting the fusion characteristic sequence into a preset snore identification model; based on preset recognition logic, analyzing the fusion characteristic sequence and outputting a confidence value corresponding to each frame; and mapping the confidence value and outputting the corresponding frame-level probability. In one embodiment, the step of extracting the audio signal of the target snore based on the frame level probability and the stored ambient audio signal comprises: based on a preset judging threshold, judging whether the frame level probability triggers audio extraction or not; if yes, extracting corresponding fragments in the environment audio signal to obtain marked fragments; filtering the marked fragments based on a preset duration threshold to obtain target fragments; And splicing the target se