CN-121528505-B - Auxiliary diagnosis and treatment system and method based on multi-mode fusion large model

CN121528505BCN 121528505 BCN121528505 BCN 121528505BCN-121528505-B

Abstract

The invention discloses an auxiliary diagnosis and treatment system and method based on a multi-mode fusion large model, which relate to the technical field of medical artificial intelligence, and aim at solving the problems of subjectivity and multi-mode data splitting in traditional diagnosis and treatment, realizing objectification and precision in auxiliary diagnosis and treatment of mental diseases and generating an authoritative auxiliary diagnosis and treatment scheme by adopting a dynamic time warping algorithm to align a voice characteristic sequence with a physiological characteristic sequence, mapping non-text mode semantic characteristics to a unified text semantic space, constructing a normalized three-dimensional characteristic matrix by combining text semantic vectors, and finally extracting keywords to match knowledge patterns through vector search and keyword search.

Inventors

ZHANG HAISHENG
HU MAOQIANG
QIAO JINYU
XIA YIJING
FU YAOYANG
LI CHUANBIN

Assignees

杭州市第一人民医院(西湖大学附属杭州市第一人民医院)
杭州精卫智能科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260115

Claims (10)

1. The auxiliary diagnosis and treatment method based on the multi-mode fusion large model is characterized by comprising the following steps of: Step 1, acquiring voice data, physiological signal data, behavior data and text data; Step 2, extracting peaks in voice data as voice feature sequences, taking abnormal points of physiological signals in physiological signal data as physiological feature sequences, and dynamically adjusting time for aligning the voice feature sequences and the physiological feature sequences by adopting a dynamic time warping algorithm; Step 3, mapping the aligned three non-text mode features of the voice feature sequence, the physiological feature sequence and the behavior data to a unified text semantic space by adopting a Q-Former model, and simultaneously extracting text semantic vectors of the text data, wherein the voice semantic vectors and the text semantic vectors are fused to form a language mode; And 4, extracting a search keyword based on the standardized three-dimensional feature matrix with weight, and matching a knowledge graph semantic vector from the knowledge graph through vector search and keyword search to generate a corresponding auxiliary diagnosis and treatment scheme.
2. The auxiliary diagnosis and treatment method based on the multi-mode fusion large model according to claim 1, wherein the step 2 specifically comprises the following substeps: step 2.1, converting the voice data time stamp and the physiological signal time stamp into the same time system coordinate, extracting a peak value in the voice data as a voice characteristic sequence, and taking a physiological signal abnormal point in the physiological signal data as a physiological characteristic sequence; step 2.2, identifying peak points in the voice characteristic sequence and abnormal points in the physiological characteristic sequence through a threshold method; step 2.3, constructing a distance matrix, and calculating Euclidean distances between peak points and abnormal points by taking time differences and characteristic correlations as measurement standards to form the distance matrix; Step 2.4, traversing the distance matrix by adopting a dynamic time warping algorithm, and solving a path with the smallest accumulated distance between the peak point and the abnormal point as an optimal matching relation; and 2.5, calculating a global offset compensation value based on the optimal matching relation, and correcting the physiological characteristic sequence time stamp according to the global offset compensation value.
3. The auxiliary diagnosis and treatment method based on the multi-mode fusion large model according to claim 2, wherein the aligned voice feature sequence and physiological feature sequence are further verified, if the time difference mean value of the peak point in the voice feature sequence and the abnormal point in the physiological feature sequence exceeds a set threshold, the distance weight and the path constraint parameter of the dynamic time warping algorithm are adjusted, and the substep 2.4 and the substep 2.5 are repeated until the time difference mean value of the peak point in the voice feature sequence and the abnormal point in the physiological feature sequence does not exceed the set threshold.
4. The auxiliary diagnosis and treatment method based on the multi-modal fusion large model according to claim 1, wherein in the step 3, the aligned three types of semantic features of the non-text modes of the voice feature sequence, the physiological feature sequence and the behavior data are mapped to a unified text semantic space, specifically: inputting the voice feature sequence into the voice branch of the Q-Former, performing attention matching with the voice feature through a pre-trained query vector group, and outputting 100 768-dimensional voice semantic vectors; inputting the physiological signal sequence into a physiological branch of the Q-Former, matching the physiological signal sequence through a dedicated query vector group, and outputting 100 768-dimensional physiological semantic vectors; and (3) inputting the behavior data into the behavior branch of the Q-Former, and outputting 100 768-dimensional behavior semantic vectors after the query behavior vectors are matched.
5. The auxiliary diagnosis and treatment method based on the multi-modal fusion large model according to claim 4, wherein in the step 3, text semantic vectors of text data are extracted, specifically: and processing text data by adopting the BERT model, outputting 100 768-dimensional text semantic vectors, wherein the text corresponds to the time frame of the non-text mode one by one.
6. The auxiliary diagnosis and treatment method based on the multi-modal fusion large model according to claim 4, wherein in the step 3, the semantic relevance among the modalities is checked, specifically: And calculating cosine similarity of 3 modal semantic vectors of the voice feature sequence, the physiological feature sequence and the behavior data in the same time frame, and if the similarity in a certain time frame is smaller than a set threshold value, re-executing semantic feature mapping until the threshold value requirement of the similarity is met.
7. The auxiliary diagnosis and treatment method based on the multi-modal fusion large model according to claim 1, wherein the standardized three-dimensional feature matrix is a weighted standardized three-dimensional feature matrix, and the weighted standardized three-dimensional feature matrix dynamically adjusts the weighting factors of the modalities based on historical data by constructing a modality reliability evaluation model, specifically: acquiring multi-modal historical data comprising different crowds, and extracting modal reliability characteristics; Constructing an evaluation model, taking the reliability characteristics of the modes as input and the actual contribution degree of the modes to diagnosis and treatment results as output labels, and training the evaluation model to quantitatively score the reliability of different crowds and different modes; Establishing a weight factor mapping rule, and mapping the reliability score output by the evaluation model into a weight factor of a mode according to a normalized proportion; calculating a real-time reliability index of the modal data in real time, and inputting the real-time reliability index into an evaluation model to obtain a real-time reliability score; The weighting factors of the modalities are dynamically adjusted based on the real-time reliability scores.
8. The auxiliary diagnosis and treatment method based on the multi-modal fusion large model according to claim 1, wherein in the step 4, the semantic vectors of the knowledge graph are matched from the knowledge graph through vector search and keyword search, specifically: Converting the search keywords into 768-dimensional semantic vectors, calculating cosine similarity with the semantic vectors of the knowledge graph in a vector database, and screening candidate knowledge with similarity larger than a set threshold value; screening knowledge items containing core words through keyword retrieval, and eliminating irrelevant knowledge; and setting priority levels to sort the candidate knowledge, carrying out structural integration, generating an association table, and determining the corresponding relation between the semantic vector of the knowledge graph and the candidate knowledge.
9. The auxiliary diagnosis and treatment system based on the multi-mode fusion large model is suitable for the auxiliary diagnosis and treatment method based on the multi-mode fusion large model as claimed in any one of claims 1 to 8, and is characterized by comprising the following steps: the data acquisition layer is connected with voice, text, physiological signals and behavior data; The model calculation layer is used for bearing core algorithm operation and generating an auxiliary diagnosis and treatment scheme based on the data of the data acquisition layer; The application service layer outputs the auxiliary diagnosis and treatment scheme generated by the model calculation layer to the user terminal; And the security layer adopts federal learning and differential privacy and authority classification to improve the data security.
10. The auxiliary diagnosis and treatment system based on the multi-mode fusion large model according to claim 9, wherein the model calculation layer further comprises a lightweight deployment module for compressing models in the full-mode data fusion module, the knowledge enhancement large model module and the personalized scheme generation module.

Description

Auxiliary diagnosis and treatment system and method based on multi-mode fusion large model Technical Field The invention relates to the technical field of medical artificial intelligence, in particular to an auxiliary diagnosis and treatment system and method based on a multi-mode fusion large model. Background Mental diseases become global public health problems, current mental health department diagnosis and treatment mainly depends on three technical paths, wherein the traditional diagnosis and treatment technology takes clinical experience of doctors as a core, and combines standardized scales of PHQ-9, HAM-D and the like to evaluate symptoms, and relies on self-report and manual interpretation of patients, and the traditional digital therapy is mostly remained in scale digitization, so that a paper quality table is converted into a mobile end tool, and a simple NLP technology is partially introduced to analyze text emotion. Therefore, the prior art has a plurality of key defects, and is difficult to support the requirements of objective diagnosis, early identification and accurate intervention of mental diseases. The traditional diagnosis and treatment only depends on a text class scale and subjective experience, multi-mode information such as voice emotion, physiological signals, behavior data and the like is not integrated, so that diagnosis results are subjectively understood by doctors and the influence of patient expression deviation is large, the existing digital therapy can realize scale digitalization, but the problems of space-time alignment and semantic unification of the multi-mode data are not solved, early hidden symptoms cannot be captured, early recognition rate is low, relevance among the data is broken, comprehensive analysis is difficult to form, and a general large model is easy to generate illusion such as wrong medication suggestion due to lack of knowledge map support of a mental health specialty, and meanwhile, the scheme safety and accuracy are insufficient. These defects together lead to the long-term subjectivity, data unilateral and intervention lag problems of mental disease diagnosis and treatment, and the like, and the basic medical scene and the data privacy protection requirements are difficult to adapt. Disclosure of Invention The invention aims to overcome the defects that the diagnosis and treatment of the mental diseases in the prior art is limited by subjectivity, data unilateral, intervention lag and the like for a long time and is difficult to adapt to the basic medical scene and the data privacy protection requirement, provides an auxiliary diagnosis and treatment system and method based on a multi-mode fusion large model, adopts a dynamic time warping algorithm to align a voice characteristic sequence and a physiological characteristic sequence, mapping the non-text mode semantic features to a unified text semantic space, combining text semantic vectors to construct a normalized three-dimensional feature matrix, finally extracting keywords, matching knowledge patterns with the keyword search through vector search, and achieving the purposes of solving the subjectivity and multi-mode data splitting problems of traditional diagnosis and treatment, achieving objectification and accuracy of mental disease auxiliary diagnosis and treatment and generating an authoritative auxiliary diagnosis and treatment scheme. The invention aims at realizing the following technical scheme: the auxiliary diagnosis and treatment method based on the multi-mode fusion large model comprises the following steps of: Step 1, acquiring voice data, physiological signal data, behavior data and text data; Step 2, extracting peaks in voice data as voice feature sequences, taking abnormal points of physiological signals in physiological signal data as physiological feature sequences, and dynamically adjusting time for aligning the voice feature sequences and the physiological feature sequences by adopting a dynamic time warping algorithm; Step 3, mapping the aligned three non-text mode features of the voice feature sequence, the physiological feature sequence and the behavior data to a unified text semantic space, extracting text semantic vectors of the text data, and fusing the voice semantic vectors and the text semantic vectors to form a language mode; and 4, extracting a search keyword based on the standardized three-dimensional feature matrix, and matching a knowledge graph semantic vector from the knowledge graph through vector search and keyword search to generate a corresponding auxiliary diagnosis and treatment scheme. Preferably, the step2 specifically includes the following substeps: step 2.1, converting the voice data time stamp and the physiological signal time stamp into the same time system coordinate, extracting a peak value in the voice data as a voice characteristic sequence, and taking a physiological signal abnormal point in the physiological signal data as a physiological ch