CN-121983093-A - TMD detection model based on voice time-frequency feature fusion, construction method and system

CN121983093ACN 121983093 ACN121983093 ACN 121983093ACN-121983093-A

Abstract

The invention relates to a TMD detection model based on voice time-frequency feature fusion, a construction method and a system thereof, comprising a time sequence feature extraction module, a frequency domain global feature embedding module, a classification module and a time sequence feature extraction module, wherein the time sequence feature extraction module is used for extracting time sequence features of voice signals according to MFCC features, the frequency domain global feature embedding module is used for embedding acoustic features related to TMD in the frequency domain of the voice signals into the time sequence features, the classification module is used for judging whether a patient corresponding to the voice signals is a TMD patient or not, the time sequence feature extraction module and the frequency domain global feature embedding module are respectively input with the MFCC features and the acoustic features of the voice signals of the TMD patient and the non-TMD patient, and the initial TMD detection model is trained to obtain the TMD detection model for detection.

Inventors

LI CHENG
LI BINGJIE
CHENG BO
WU LAN
WANG SHUANGYING
YUE YUAN
ZHAO SIYU
YANG FUHUA
DENG TONG
ZHANG JIEYI

Assignees

武汉大学中南医院

Dates

Publication Date: 20260505
Application Date: 20260408

Claims (7)

1. The construction method of the TMD detection model based on the voice time-frequency feature fusion is characterized by comprising the following steps: S01, constructing an initial TMD detection model; The TMD detection model comprises a time sequence feature extraction module, a frequency domain global feature embedding module and a classification module; The time sequence feature extraction module is used for extracting time sequence features of the voice signal according to the MFCC features of the voice signal; the frequency domain global feature embedding module is used for embedding acoustic features related to TMD in the frequency domain of the voice signal into the time sequence features to generate a time-frequency fusion feature vector; The classification module is used for judging whether the patient corresponding to the voice signal is a TMD patient according to the time-frequency fusion feature vector; Step S02, extracting the voice signals of a TMD patient and a non-TMD patient from a voice database, respectively inputting the MFCC features and the acoustic features of the voice signals of the TMD patient and the non-TMD patient into the time sequence feature extraction module and the frequency domain global feature embedding module, taking whether the corresponding TMD patient is used as a label, and training the initial TMD detection model to obtain a TMD detection model for detection; wherein the acoustic features include fundamental frequency, acoustic, short time zero crossing rate, fundamental frequency disturbances, and amplitude disturbances.
2. The method of claim 1, wherein the timing feature extraction module is an X-vector model based timing feature extraction module.
3. The method of claim 2, wherein the X-vector model based timing feature extraction module comprises a speech timing feature extraction network and an utterance level feature extraction module; The voice time sequence feature extraction network adopts a multi-layer time delay neural network; the speech-level feature extraction module comprises a global statistics pooling layer and a multi-layer feedforward neural network; The global statistics pooling layer is used for carrying out aggregation processing on all time frame features output by the time delay neural network, generating statistics feature vectors representing the whole voice, and inputting the statistics feature vectors into the feedforward neural network of multiple layers; And the feed-forward neural network is used for processing the statistical feature vectors, outputting time sequence information containing voice and fusing voice feature vectors of pathological features related to TMD.
4. The method for constructing a time-frequency fusion feature vector according to claim 3, wherein the frequency domain global feature embedding module is specifically configured to calculate statistical description indexes of the acoustic features on the whole speech segment respectively, construct a frequency domain global feature vector with 103 dimensions, normalize the frequency domain global feature vector, reduce the 103 dimensions of the frequency domain global feature to 64 dimensions of the frequency domain feature through a full connection layer, and embed the frequency domain global feature into 256 dimensions of time domain feature representation generated by the time sequence feature extraction module based on an X-vector model, and obtain the time-frequency fusion feature vector with 320 dimensions after fusion.
5. The method according to claim 4, wherein the classification module is specifically configured to use multiple full-connection layers to perform nonlinear integration on the time-frequency fusion feature vector.
6. A TMD detection model based on speech time-frequency feature fusion, which is a TMD detection model for detection obtained by the method for constructing a TMD detection model based on speech time-frequency feature fusion according to any one of claims 1 to 5.
7. A TMD detection system based on voice time-frequency feature fusion, which is characterized by comprising a voice acquisition module and a TMD detection model for detection according to claim 6; The voice acquisition module is used for acquiring a voice signal of a patient to be diagnosed, and acquiring MFCC characteristics of the voice signal and acoustic characteristics related to TMD in a frequency domain, wherein the acoustic characteristics comprise fundamental frequency, sound, short-time zero-crossing rate, fundamental frequency disturbance and amplitude disturbance; A timing feature extraction module of the TMD detection model for detection, for extracting timing features of the speech signal according to the MFCC features; The frequency domain global feature embedding module of the TMD detection model for detection is used for embedding the acoustic features into the time sequence features to generate time-frequency fusion feature vectors; and the classification module of the TMD detection model is used for judging whether the patient to be diagnosed is a TMD patient according to the time-frequency fusion feature vector.

Description

TMD detection model based on voice time-frequency feature fusion, construction method and system Technical Field The invention relates to the technical field of risk prediction, in particular to a TMD detection model based on voice time-frequency feature fusion, a construction method and a construction system. Background With the acceleration of modern life rhythm and the increase of working pressure, the incidence of TMD (temporomandibular joint disorder) is on the rise, affecting people of all ages. As a common oromaxillofacial disease, TMD may not only cause pain and mandibular movement disorder, but may also have some effect on the patient's pronunciation function. Traditional TMD diagnosis mainly depends on clinical examination of specialists, imaging examination and subjective symptom description of patients, has a complicated process, lacks objective and quantitative assessment means, and is easily influenced by doctor experience and patient expression capability. Such highly specialized experience-dependent diagnostic modes are often difficult to achieve, especially in basic level or local cities lacking temporomandibular joint specialists, resulting in the inability of many patients to obtain timely and accurate diagnosis and intervention therapy. In recent years, the rapid development of artificial intelligence technology provides a new idea for intelligent auxiliary diagnosis of TMD. More and more research is beginning to try to use various modal data such as images, sensors and voice to conduct TMD recognition and disease analysis. Disclosure of Invention The invention aims to provide a TMD detection model based on voice time-frequency feature fusion, a construction method and a system, and solves the problems in the prior art. The technical scheme for solving the technical problems is as follows: a construction method of a TMD detection model based on voice time-frequency feature fusion comprises the following steps: S01, constructing an initial TMD detection model; the TMD detection model comprises a time sequence feature extraction module, a frequency domain global feature embedding module and a classification module; the system comprises a time sequence feature extraction module, a frequency domain global feature embedding module, a time sequence feature extraction module and a time sequence feature extraction module, wherein the time sequence feature extraction module is used for extracting time sequence features of a voice signal according to MFCC features of the voice signal; the classification module is used for judging whether a patient corresponding to the voice signal is a TMD patient or not according to the time-frequency fusion feature vector; Step S02, extracting voice signals of a TMD patient and a non-TMD patient from a voice database, respectively inputting MFCC characteristics and acoustic characteristics of the voice signals of the TMD patient and the non-TMD patient into a time sequence characteristic extraction module and a frequency domain global characteristic embedding module, taking whether the corresponding TMD patient is a label or not, and training an initial TMD detection model to obtain a TMD detection model for detection; acoustic features include fundamental frequency, acoustic, short time zero crossing rate, fundamental frequency disturbances, and amplitude disturbances, among others. The TMD detection model has the beneficial effects that the time sequence feature extraction module takes the MFCC features extracted from the original voice signals as an input basis, certain limitation of the MFCC in representing the voice features of a TMD patient is considered, the frequency domain global feature embedding module is introduced into the TMD detection model, the frequency domain global feature embedding module embeds the acoustic features (fundamental frequency, sound, short-time zero-crossing rate, fundamental frequency disturbance and amplitude disturbance) related to the TMD in a frequency domain into the time sequence feature representation, the overall feature expression capability is enhanced, the accuracy and the robustness of the TMD detection model are remarkably improved on the basis of more comprehensively describing the TMD related voice anomalies, the design is helpful for more comprehensively mining hidden pathological information in the voice signals, and the TMD auxiliary diagnosis method based on the voice signals is explored, so that the TMD auxiliary diagnosis method which is non-invasive, convenient and has wide application value is provided. The invention also discloses a TMD detection model based on the voice time-frequency feature fusion, which is a TMD detection model for detection obtained by adopting the construction method of the TMD detection model based on the voice time-frequency feature fusion. The invention also discloses a TMD detection system based on the voice time-frequency feature fusion, which comprises a voice acquisition mod