CN-122004771-A - Health monitoring device, equipment, medium and product based on voice baseline model

CN122004771ACN 122004771 ACN122004771 ACN 122004771ACN-122004771-A

Abstract

The invention discloses a health monitoring device, equipment, medium and product based on a voice baseline model, which adopts an individualized voice baseline model to dynamically compare voice characteristics of a user, overcomes the limitation of a general voice model in cross-individual generalization capability, improves characterization precision of voice dimensions related to health such as voice stability, breathing rhythm and the like, sets a first threshold, realizes rapid and independent single-mode early warning when the voice characteristics deviate remarkably, ensures response timeliness, sets a second threshold, introduces various electrophysiological signals to perform cross verification when the deviation degree is in a middle interval, inhibits voice pseudo abnormality caused by environmental noise, emotion fluctuation or short fatigue, reduces false alarm rate, ensures causality and consistency of multi-mode data fusion analysis and enhances robustness and reliability of health state discrimination, and outputs of all modules are aligned based on the same timestamp.

Inventors

SHAN XIAOMING
QIU YINGWEI
RUAN QUNZHI
LI RONGFENG
Yang Biwan
TANG SHIYU

Assignees

广州易而达科技股份有限公司

Dates

Publication Date: 20260512
Application Date: 20260225

Claims (10)

1. A voice baseline model-based health monitoring device, comprising: the voice acquisition module is used for acquiring voice signals of a user; The physiological signal acquisition module is used for acquiring various physiological signals of the user; the voice feature extraction module is used for processing the voice signals and extracting voice features from the voice signals, wherein the voice features are used for representing the current voice stability, breathing rhythm, pronunciation definition, speech speed and intonation of the user; The feature comparison module is used for comparing the voice features with a voice baseline model of the user and determining the deviation degree of the voice features relative to the voice baseline model; The first result output module is used for outputting a monitoring result of the user in an unhealthy state currently when the deviation degree is greater than or equal to a first threshold value; and the second result output module is used for outputting monitoring results by combining the plurality of physiological signals when the deviation degree is larger than or equal to a second threshold value and smaller than the first threshold value.
2. The voice baseline model-based health monitoring device of claim 1, wherein the voice feature extraction module comprises: the framing sub-module is used for dividing the voice signal into a plurality of continuous voice frames; the normalization sub-module is used for carrying out normalization processing on each voice frame; The feature coding sub-module is used for carrying out feature coding on the normalized voice frame to obtain coding features of the voice frame; and the context feature extraction sub-module is used for extracting the context information representing the coding features of all the voice frames as voice features.
3. The voice baseline model-based health monitoring device of claim 2, wherein the feature encoding submodule comprises: the convolution unit is used for carrying out convolution processing on the input data to obtain convolution characteristics; The layer normalization unit is used for carrying out layer normalization processing on the convolution characteristics to obtain layer normalization characteristics; A nonlinear activation unit, configured to perform nonlinear activation on the layer normalization feature; The return execution unit is used for taking the output data of the nonlinear activation unit as the input data of the convolution unit, and returning to execute the execution processes of the convolution unit, the layer normalization unit and the nonlinear activation unit until the preset rounds are repeatedly executed, wherein the number of channels of the convolution unit in each round is the same, and the size of a convolution kernel and the convolution step length are reduced along with the increase of the rounds; and the round fusion unit is used for fusing the output result of each round as the coding characteristic of the voice frame.
4. The voice baseline model-based health monitoring device of claim 2, wherein the contextual feature extraction submodule comprises: the sampling unit is used for carrying out sliding sampling on the coding characteristics of all the voice frames by adopting a sliding window, wherein the window size of the sliding window is larger than 2, and the sliding step length of the sliding window is 1; The position embedding unit is used for inputting all samples of the sliding window into the grouped convolution layer, extracting and embedding the relative positions of the coding features; the feature fusion unit is used for embedding and fusing each sample of the sliding window with the corresponding relative position to obtain fusion features; the feature splicing unit is used for splicing the fusion features corresponding to all the samples to obtain splicing features; and the voice feature extraction unit is used for inputting the spliced features into a transducer encoder for processing to obtain context information representing the coding features of all voice frames as voice features.
5. The voice baseline model-based health monitoring device according to any one of claims 1-4, further comprising: The health voice acquisition module is used for acquiring historical voice signals under the health state of a user; the historical voice feature extraction module is used for processing each historical voice signal and extracting historical voice features from the historical voice signals, wherein the historical voice features are used for representing voice stability, breathing rhythm, pronunciation definition, speech speed and intonation of the historical voice signals; the voice baseline model construction module is used for calculating the mean vector and covariance matrix of all the historical voice features to serve as a voice baseline model.
6. The voice baseline model-based health monitoring device of claim 5, wherein the feature comparison module comprises: and the distance calculation sub-module is used for calculating the mahalanobis distance between the voice feature and the voice baseline model as the deviation degree of the voice feature relative to the voice baseline model.
7. The voice baseline model-based health monitoring device of any one of claims 1-4, wherein the second result output module comprises: the abnormal signal judging sub-module is used for judging whether abnormal physiological signals exist or not when the deviation degree is larger than or equal to a second threshold value and smaller than a first threshold value; the first result output sub-module is used for outputting a monitoring result of the current health state of the user when no abnormal physiological signal exists; and the second result output sub-module is used for outputting the monitoring result of the unhealthy state of the user when the abnormal physiological signal exists.
8. An electronic device, comprising: one or more processors; a storage means for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of claim 1-7 for voice baseline model-based health monitoring device.
9. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a method according to any of claims 1-7 for a voice baseline model based health monitoring device.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements a method corresponding to a voice baseline model based health monitoring device as claimed in any one of claims 1-7.

Description

Health monitoring device, equipment, medium and product based on voice baseline model Technical Field The present invention relates to health monitoring technologies, and in particular, to a health monitoring device, apparatus, medium, and product based on a voice baseline model. Background Health monitoring primarily involves the use of various devices and systems to track and record the physiology and biomarkers of an individual in real time, including the use of wearable devices, mobile applications, and remote monitoring systems to collect health data such as heart rate, blood pressure, blood glucose levels, sleep quality, and more other vital signs, with the aim of improving the convenience and efficiency of health management so that individuals and healthcare providers can better understand health, prevent disease, and adjust treatment regimens in time. In the existing health monitoring scheme, each sensor usually works independently, for example, a heart rate sensor is used for monitoring heart rate data, judging whether the heart rate is abnormal, a blood pressure sensor is used for monitoring blood pressure data, judging whether the blood pressure is normal, and the sensors lack of cooperation, so that the monitoring result of a single sensor is often inaccurate. Disclosure of Invention The invention provides a health monitoring device, equipment, medium and product based on a voice baseline model, which are used for enhancing the robustness and the credibility of health state discrimination. In a first aspect, the present invention provides a health monitoring device based on a baseline model of speech, comprising: the voice acquisition module is used for acquiring voice signals of a user; The physiological signal acquisition module is used for acquiring various physiological signals of the user; the voice feature extraction module is used for processing the voice signals and extracting voice features from the voice signals, wherein the voice features are used for representing the current voice stability, breathing rhythm, pronunciation definition, speech speed and intonation of the user; The feature comparison module is used for comparing the voice features with a voice baseline model of the user and determining the deviation degree of the voice features relative to the voice baseline model; The first result output module is used for outputting a monitoring result of the user in an unhealthy state currently when the deviation degree is greater than or equal to a first threshold value; and the second result output module is used for outputting monitoring results by combining the plurality of physiological signals when the deviation degree is larger than or equal to a second threshold value and smaller than the first threshold value. Optionally, the voice feature extraction module includes: the framing sub-module is used for dividing the voice signal into a plurality of continuous voice frames; the normalization sub-module is used for carrying out normalization processing on each voice frame; The feature coding sub-module is used for carrying out feature coding on the normalized voice frame to obtain coding features of the voice frame; and the context feature extraction sub-module is used for extracting the context information representing the coding features of all the voice frames as voice features. Optionally, the feature encoding submodule includes: the convolution unit is used for carrying out convolution processing on the input data to obtain convolution characteristics; The layer normalization unit is used for carrying out layer normalization processing on the convolution characteristics to obtain layer normalization characteristics; A nonlinear activation unit, configured to perform nonlinear activation on the layer normalization feature; The return execution unit is used for taking the output data of the nonlinear activation unit as the input data of the convolution unit, and returning to execute the execution processes of the convolution unit, the layer normalization unit and the nonlinear activation unit until the preset rounds are repeatedly executed, wherein the number of channels of the convolution unit in each round is the same, and the size of a convolution kernel and the convolution step length are reduced along with the increase of the rounds; and the round fusion unit is used for fusing the output result of each round as the coding characteristic of the voice frame. Optionally, the contextual feature extraction submodule includes: the sampling unit is used for carrying out sliding sampling on the coding characteristics of all the voice frames by adopting a sliding window, wherein the window size of the sliding window is larger than 2, and the sliding step length of the sliding window is 1; The position embedding unit is used for inputting all samples of the sliding window into the grouped convolution layer, extracting and embedding the relative positions of the coding features; t