CN-121980362-A - Vocal cord tumor multi-classification analysis model training method and system based on voice

CN121980362ACN 121980362 ACN121980362 ACN 121980362ACN-121980362-A

Abstract

The invention is suitable for the technical field of machine learning classification, and provides a vocal cord tumor multi-classification analysis model training method and system based on voice, wherein the method comprises the following steps of voice data collection; the method comprises the steps of data processing, model construction, model training and optimization, namely inputting initial tensor into a convolution module, extracting to obtain an initial feature map, sequentially sending the initial feature map into a multi-stage LHG module for processing, outputting an updated feature map, and outputting probability distribution corresponding to three categories of vocal cord polyps, premalignant lesions and malignant tumors according to the updated feature map. The invention can noninvasively and accurately automatically analyze the pathological types of the vocal cord tumor, so as to solve the problems that the existing method depends on invasive biopsy, the performance of the existing intelligent algorithm in multi-classification analysis tasks is insufficient, and the like.

Inventors

YU DAN
SUN WEIJIA
WANG TUANJIE
LI SHUANG
ZHENG LINLIN
ZHAO XUE
GUO HAIXIAN

Assignees

吉林大学
长春通达数据技术有限责任公司

Dates

Publication Date: 20260505
Application Date: 20260407

Claims (8)

1. A vocal cord tumor multi-classification analysis model training method based on voice is characterized by comprising the following steps: voice data collection, namely collecting voice audio data of a patient with the pathologically diagnosed vocal cord lesions; preprocessing the voice audio data to generate a Mel spectrogram, performing data enhancement processing and segmentation processing on the Mel spectrogram, and mapping to form an initial tensor; The model construction comprises the steps of constructing a local high-order graph neural network model, wherein the local high-order graph neural network model comprises a convolution module and a multi-stage stacked LHG module; The model training and optimizing method comprises the steps of inputting the initial tensor into a convolution module, extracting to obtain an initial feature map, sequentially sending the initial feature map into a multi-stage LHG module for processing, outputting an updated feature map, outputting probability distribution corresponding to three categories of vocal cord polyps, vocal cord precancerous lesions and vocal cord malignant tumors according to the updated feature map, calculating a loss function based on the probability distribution and a real pathological label as an optimizing target, and iteratively updating parameters of the local high-order map neural network model through a back propagation algorithm to complete model training.
2. The method for training a vocal tract tumor multi-classification analysis model according to claim 1, wherein the step of preprocessing the vocal tract audio data to generate a mel spectrogram specifically comprises: the voice audio data is subjected to framing, windowing and discrete Fourier transformation, the voice audio data is converted into a frequency domain signal from a time domain signal, and a plurality of frequency bands are divided to obtain a Mel spectrogram.
3. The voice-based vocal cord tumor multi-classification analysis model training method according to claim 1, wherein the data enhancement processing comprises one or more of random occlusion in time domain and frequency domain dimensions, random Gaussian noise addition and random scrolling in time domain, and the segmentation processing is that a Mel spectrogram of the data enhancement processing is segmented into image blocks with preset fixed sizes, and is mapped into feature embedding vectors through linear projection to form an initial tensor.
4. The voice-based vocal tract tumor multi-classification analysis model training method according to claim 1, wherein in each LHG module, the following two graph structures are simultaneously constructed based on the current initial feature graph: setting the space size of the input feature map as H×W, regarding each space position as a node, the node number as N=H×W, selecting k spatially nearest nodes based on Euclidean distance in the feature map for each node to construct an adjacent relation, and obtaining a local adjacent matrix ; Clustering all node features into m categories by adopting a fuzzy C-means clustering algorithm to obtain a clustering center, wherein m < < N And membership matrix Constructing a bipartite graph between the nodes and the clustering center The edge weight is determined by membership Determining; the LHG module executes graph convolution on the local k nearest neighbor graph and the high-order fuzzy C-means clustering graph respectively through a message passing mechanism of the graph neural network, gathers information from local neighbor nodes and the high-order clustering center, fuses the two updated features through the learnable weights, realizes efficient fusion of local detail abnormality and global structure context, and outputs the updated feature graph.
5. The method of training a voice-based vocal tract tumor multi-classification analysis model according to claim 4, wherein the local higher order graph neural network model further comprises: A downsampling layer for reducing the spatial resolution of the feature map by a convolution operation with a step length greater than 1 between different stages; And the classification head is used for carrying out global average pooling and full-connection layer operation on the updated feature map and outputting probability distribution corresponding to three categories of vocal cord polyps, premalignant lesions and vocal cord malignant tumors.
6. The method of claim 1, wherein the loss function is a cross entropy loss function.
7. A vocal tract tumor multi-classification analysis model trained by the vocal tract tumor multi-classification analysis model training method based on vocal tract according to any one of claims 1 to 6.
8. A training system for a vocal tract tumor multi-classification analysis model based on voice, for implementing the training method of the vocal tract tumor multi-classification analysis model based on voice as claimed in any one of claims 1 to 6, comprising: the data acquisition module is used for collecting voice data of the patient with the pathologically diagnosed vocal cord lesion; the data processing module is used for preprocessing the voice audio data to generate a Mel spectrogram, carrying out data enhancement processing and segmentation processing on the Mel spectrogram, and mapping to form an initial tensor; the model construction module is used for constructing a local high-order graph neural network model, wherein the local high-order graph neural network model comprises a convolution module and a multi-stage stacked LHG module; The model training and optimizing module is used for inputting the initial tensor into the convolution module to extract an initial feature map, sequentially sending the initial feature map into the multi-stage LHG module to be processed, outputting an updated feature map, outputting probability distribution corresponding to three categories of vocal cord polyps, vocal cord precancerous lesions and vocal cord malignant tumors according to the updated feature map, calculating a loss function based on the probability distribution and a real pathological label as an optimizing target, and iteratively updating parameters of the local high-order map neural network model through a back propagation algorithm to complete model training.

Description

Vocal cord tumor multi-classification analysis model training method and system based on voice Technical Field The invention belongs to the technical field of machine learning classification, and particularly relates to a vocal cord tumor multi-classification analysis model training method based on voice. Background Voice is a core medium for human communication, emotion expression and social participation, and quality changes of voice can significantly influence life quality of patients. Vocal tumors, mainly including benign lesions (e.g., polyps of the vocal cords), precancerous lesions (e.g., hyperkeratosis, atypical hyperplasia), and malignant tumors (e.g., squamous cell carcinoma), are common organic etiologies that lead to persistent, progressive hoarseness. Clinically, accurately distinguishing the pathological nature of these lesions is critical to formulating proper patient management and treatment strategies. Currently, traditional methods of analysis of vocal cord tumors rely primarily on laryngoscopy (e.g., electronic laryngoscopy) and pathology biopsy. However, these methods have significant limitations, firstly, in that laryngoscopy interpretation is highly subjective, highly dependent on the expert experience of the physician, and secondly, pathology biopsy is an invasive procedure that can present discomfort and potential risk to the patient and is not suitable for large-scale primary screening or dynamic monitoring. In recent years, with the development of artificial intelligence technology, noninvasive auxiliary analysis based on voice audio becomes an important research direction. Voice is not only the clinical manifestation of vocal lesions, but also the acoustic characteristics of voice contain abundant pathological information. Researchers have tried to input various features (e.g., mel-frequency cepstrum coefficients MFCC, mel-frequency spectrograms, etc.) extracted from patient audio signals into machine learning or deep learning models (e.g., support vector machines SVM, convolutional neural networks CNN) to achieve automatic classification of pathological voices. Although the existing studies have achieved higher accuracy in classifying tasks (e.g., distinguishing normal from pathological voice), the performance of the model has yet to be improved in more clinically challenging multi-classifying tasks, i.e., accurately identifying vocal cord polyps, precancerous lesions, and malignant tumors simultaneously. This is mainly because the acoustic features of premalignant lesions are similar to those of early malignant tumors, and the discrimination is difficult. In addition, when many existing models are in an audio spectrogram, local fine abnormal features and global acoustic structure context information are difficult to effectively and efficiently fuse, so that model discrimination is insufficient, calculation efficiency is low, and accuracy of final classification and generalization capability of the models are affected. Therefore, there is a need in the art for an intelligent analysis model training method that can fully utilize voice audio information, efficiently fuse local and global features of spectrograms, and exhibit high accuracy and robustness in vocal cord tumor multi-classification tasks, so as to provide a reliable noninvasive primary screening tool for clinic and assist doctors in analysis and identification. Disclosure of Invention The invention aims to provide a vocal cord tumor multi-classification analysis model training method and system based on voice, and aims to solve the technical problems. The invention is realized in such a way that the vocal cord tumor multi-classification analysis model training method based on voice comprises the following steps: voice data collection, namely collecting voice audio data of a patient with the pathologically diagnosed vocal cord lesions; preprocessing the voice audio data to generate a Mel spectrogram, performing data enhancement processing and segmentation processing on the Mel spectrogram, and mapping to form an initial tensor; The model construction comprises the steps of constructing a local high-order graph neural network model, wherein the local high-order graph neural network model comprises a convolution module and a multi-stage stacked LHG module; The model training and optimizing method comprises the steps of inputting the initial tensor into a convolution module, extracting to obtain an initial feature map, sequentially sending the initial feature map into a multi-stage LHG module for processing, outputting an updated feature map, outputting probability distribution corresponding to three categories of vocal cord polyps, vocal cord precancerous lesions and vocal cord malignant tumors according to the updated feature map, calculating a loss function based on the probability distribution and a real pathological label as an optimizing target, and iteratively updating parameters of the local high-order map neural ne