CN-121999992-A - Medical scene dynamic interaction decision-making system and method based on multi-mode perception

CN121999992ACN 121999992 ACN121999992 ACN 121999992ACN-121999992-A

Abstract

The invention relates to the technical field of multi-modal sensing, in particular to a medical scene dynamic interaction decision system and method based on multi-modal sensing. According to the invention, through the screening of the structure significant frames by the edge and texture changes between images, the comparison of symptom keywords and focus labels in case texts is combined, semantic omission is identified, the query contents are completed in a targeted mode, the completion contents comprise missing symptom features, uncovered part information and potential focus descriptions, an abnormal factor set is constructed based on voice speech speed change, physiological waveform jump and expression muscle tension change, and factors comprise abnormal voice features, key physiological signal fragments and expression tension change directions, so that semantic coverage integrity, interactive response accuracy and multi-modal fusion efficiency are improved on the basis, and medical perception and reasoning effects are enhanced.

Inventors

HU ZHUOCHAO
YANG YIYONG
ZHENG YUNLONG
XU SIYI

Assignees

广州上诺生物技术有限公司

Dates

Publication Date: 20260508
Application Date: 20260106

Claims (10)

1. A medical scene dynamic interactive decision system based on multi-modal awareness, the system comprising: the main node identification module is used for screening focus candidate areas by extracting gray scale, edge and texture characteristics in medical images, calculating inter-frame edge changes and texture differences, analyzing image structure saliency, screening image frames as image dominant nodes and transmitting the image frames to the coverage judgment module and the reconstruction guide module; The coverage judgment and judgment module extracts patient medical record text according to the image leading node, recognizes symptom keywords, compares the symptom keywords with focus labels corresponding to the image leading node, generates semantic uncovered identification data, and transmits the semantic uncovered identification data to the reconstruction guide module; The reconstruction guide module extracts uncovered keywords and positions the missing semantic directions through the image dominant nodes and the semantic uncovered identification data, reconstructs physician query sentences, outputs a physician query sentence set and transmits the physician query sentence set to the factor construction module; the factor constructing module monitors the voice, physiological signals and facial expressions of the patient in real time through the physician query statement set, extracts the keyword interval and the speed change in the voice, combines the physiological index positioning signal segment to identify the waveform inversion and amplitude jump characteristics, combines the facial expression muscle tension change, constructs an abnormal co-occurrence factor set and transmits the abnormal co-occurrence factor set to the path rearrangement module.
2. The multi-modal awareness based medical scenario dynamic interaction decision system of claim 1, wherein the image dominant nodes comprise image frame numbers, dominant region locations, structural saliency scores, the semantic uncovered identification data comprises a missing tag list, keyword matching, semantic coverage, the physician query statement set comprises a targeting indictment, a reconstruction statement template, candidate guide words, and the abnormal co-occurrence factor set comprises a speech emotion mutation point, a facial expression abnormality pattern, physiological signal abnormality features.
3. The multi-modal awareness based medical scenario dynamic interactive decision-making system of claim 1, wherein the master node identification module comprises: The candidate screening submodule extracts gray scale, edges and texture features in the medical image, extracts regional contours in combination with boundary information, screens gray scale distribution abnormal regions and builds an edge contour set, compares the consistency between the contours and gradients, identifies potential focus regions and generates candidate region consistency coefficient values; The edge texture analysis submodule acquires an edge distribution diagram of a corresponding region in continuous image frames based on the candidate region consistency coefficient value, calculates an inter-frame edge intensity difference value and a texture offset rate, screens regions of which the edge offset amplitude and the texture distribution change rate exceed an edge offset reference range and a texture fluctuation reference range, and obtains a structure change joint offset value; the edge deviation reference range is set as the upper and lower boundaries of normal edge change by extracting the edge intensity difference value of the non-focus region in the continuous image frames and counting the average level and the fluctuation amplitude; The texture fluctuation reference interval sets an upper threshold and a lower threshold according to a common fluctuation range by extracting gray level co-occurrence matrix energy, contrast and local binary pattern difference between frames of a non-focus region; the dominant frame identification submodule extracts the structural outline and the internal texture concentration of the corresponding region according to the structural change joint offset value, calculates the structural concentration, compares the structural concentration with a significance density reference value, identifies the image frame reaching the deviation standard as a dominant node, and establishes an image dominant node; The specific formula for extracting the corresponding region structure outline and the internal texture aggregation degree is as follows: ; calculating a structural density value ; Wherein, the Representing the value of the density of the structure, Representing the total number of pixels in the extraction area, Represents the first Gray gradient values of each pixel point in the gradient direction of the transverse structure, Represents the first Gray gradient values of each pixel point in the longitudinal structural gradient direction, Represents the first Gradient weight value of texture aggregation degree of each pixel point, Representing a structural variance adjustment factor for dynamically adjusting texture response weights, Represents the first Absolute deviation values of gray values of the individual pixels in the local window, Representing the local structure average density value calculated by the texture features of all pixel points in the extracted region, Is the index number of the pixel point, The total number of pixels in the extraction area; The significance density benchmark value is a standard set by counting the distribution range of a high-density region of the significance density benchmark value according to the aggregation degree of the structure and the texture characteristics in a known focus region; The deviation standard is a minimum identifiable demarcation set by comparing the numerical differences in structure and texture aggregation scores of the dominant region and the common region.
4. The multi-modal awareness based medical scenario dynamic interactive decision system of claim 3 wherein the coverage judgment decision module comprises: The dominant node extraction submodule identifies the structural characteristics of the corresponding region according to the image dominant nodes, performs combination classification and type grouping on each characteristic by analyzing the gray level, the edge and the texture characteristics of the focus candidate region, outputs focus type labels of each region according to the type grouping result, establishes a corresponding set of the nodes and focuses, and generates image focus label comparison quantity; The symptom keyword recognition sub-module invokes the image focus label comparison quantity, acquires patient medical record text data, divides words to extract symptom descriptors in the text, screens keywords corresponding to disease symptoms, constructs a term set according to the occurrence frequency of the keywords and a part-of-speech structure, and generates symptom keyword extraction quantity; and the semantic comparison judging sub-module is used for comparing the label content in the image focus label comparison quantity according to the symptom keyword extraction quantity, constructing a term missing index according to the semantic matching degree of the set, calculating a matching gap ratio for the number of missing terms and the total quantity of labels, establishing numbers of uncovered terms and corresponding node identifiers, and generating semantic uncovered identification data.
5. The multi-modal awareness based medical scenario dynamic interactive decision making system of claim 4 wherein the reconstruction guidance module comprises: The keyword extraction submodule extracts focus content of associated nodes in the missing identifications through the image leading nodes and the semantic uncovered identification data, positions a missing term set corresponding to the identification numbers, extracts keywords, screens out non-symptom term sets, establishes a keyword list and generates uncovered keyword quantity values; The semantic direction judging submodule invokes the uncovered keyword quantity value, acquires corresponding anatomical structure labels and space orientations in the image leading nodes, screens unaligned contents of missing term in the label set, establishes a direction pairing relation between the missing term and the structure label, calculates occurrence frequency ordering and generates a missing semantic direction trend value; and the query reconstruction sub-module is used for matching a label template in a semantic library according to the missing semantic direction trend value, extracting a high-frequency structural keyword and combining the high-frequency structural keyword with a corresponding query sentence pattern, constructing a query sentence rearrangement sequence driven by the missing keyword, sequentially generating sentence pattern groups, and generating a physician query sentence set.
6. The multi-modal awareness based medical scenario dynamic interactive decision system of claim 5 wherein the factor construction module comprises: The semantic rhythm extraction submodule acquires patient voice signals in real time through the physician query statement set, segments the voice speed change and interval time sequence through extracting the time interval and pronunciation duration between keywords in voice, calculates the voice speed slope and the rhythm fluctuation degree of each segment, normalizes the fluctuation range of the voice syllable on a time axis, and establishes the voice syllable fluctuation value; The physiological signal identification submodule extracts an electrocardiogram and a skin electric signal in a corresponding period according to the time range covered by the voice rhythm fluctuation value, carries out amplitude difference and polarity direction detection on the signal waveform, calculates the jump times and the reverse occurrence frequency in unit time, and carries out grouping classification according to the jump amplitude and the reverse frequency to obtain the jump ratio of the fluctuation signal; And the facial collaborative mapping submodule invokes the time point information marked by the jumping ratio of the fluctuation signal, extracts a facial image sequence of a corresponding period, divides a region, tracks the pixel intensity change in the region, calculates the muscle tension change, identifies the facial muscle tension data, the voice rhythm and the mutation points in various physiological signals, analyzes the overlapping sections of the three types of mutation points on a time axis, counts the co-occurrence frequency, and establishes an abnormal co-occurrence factor set.
7. The multi-modal awareness based medical scenario dynamic interaction decision-making system of claim 6, wherein the specific formula of the mutation points in the detected syllable law is: ; calculating a syllable law mutation trend value; Wherein, the Is the first Absolute value of the slope rate of the segment voice rhythms, Is the first The slope of the speech rate of the segment speech, Is the first The slope of the speech rate of the segment speech, Is the first Segment and the first The difference in the start time between the segments, Is the first The average syllable interval duration of the keywords in the segment, Is the first The average pronunciation duration of the keywords in the segment, Is the first The segment contains the average value of the speech energy density of the keyword, For the index number of the current speech segment in the rhythmic sequence, Index variables within the sum interval for traversing the speech segment numbers.
8. The multi-modal awareness based medical scenario dynamic interactive decision system of claim 1, wherein: the path rearrangement module extracts symptom label sequences in the auxiliary consultation path through the abnormal co-occurrence factor set, matches the label with image leading nodes and key word sense directions, rearranges interaction inquiry sequences according to the matching condition, adjusts doctor interaction response execution sequences and generates a medical scene dynamic interaction decision instruction set; The medical scene dynamic interaction decision instruction set comprises a query priority order, a label semantic mapping path and an interaction response adjustment scheme.
9. The multi-modal awareness based medical scenario dynamic interactive decision system of claim 8, wherein the path rearrangement module comprises: The symptom label extraction submodule extracts an auxiliary inquiry path through the abnormal co-occurrence factor set, identifies symptom labels corresponding to each node in the path, records the arrangement sequence of the labels in the original path, constructs a label and sequence position corresponding set, and generates a label path sequence value; the auxiliary inquiry path is a standardized inquiry sequence which is preset by a system and is constructed based on a clinical knowledge base and a disease evolution flow, is used for providing a structured symptom label sequence and an inquiry template for doctors in the intelligent medical interaction process, assists in completing systematic and comprehensive patient state acquisition, is taken as a basic interaction frame, comprises a plurality of symptom nodes arranged according to disease characteristics, and dynamically adjusts the sequence according to image dominant nodes and semantic keyword directions; The semantic matching judging sub-module invokes the label path sequence value, matches the content of the dominant nodes of the image with the semantic direction of the key words, calculates the matching times according to the appearance positions of the labels in the image area and the direction terms, constructs a label and node matching table, screens the priority according to the number, and generates a path semantic matching coefficient; And the interaction instruction generation sub-module adjusts the response sequence of the symptom labels in the interaction flow according to the path semantic matching coefficient, rearranges the label calling structure in the original path according to the matching priority, establishes the mapping relation between the label nodes and the response content, and generates a medical scene dynamic interaction decision instruction set.
10. A medical scene dynamic interaction decision method based on multi-modal awareness, characterized in that the method is used for realizing the medical scene dynamic interaction decision system based on multi-modal awareness according to any one of claims 1-9, the method comprises: s1, screening a region with a closed contour and complex texture as a focus candidate region by extracting gray scale, edge and texture characteristics in a medical image, calculating a region morphology change trend in continuous frames, judging a structure mutation degree, optimizing image structure differences in a frame sequence, screening image frames with outstanding structure changes, and generating an image dominant node; S2, acquiring marking information of corresponding image frames and symptom keywords in case texts according to the image leading nodes, analyzing semantic matching relations between the keywords and focus labels, screening keywords of semantic corresponding items which do not appear in the labels, identifying missing positions in a text semantic chain, generating a keyword set which cannot be associated, and constructing semantic uncovered identification data; s3, locating the context logic position of the keyword through the image leading node and the semantic uncovered identification data, calculating the topic association direction in the symptom expression sequence, screening a sentence pattern structure template adapting to the semantic direction, adjusting the word sequence and position combination of the keyword in the sentence pattern, analyzing the grammar integrity of the generated sentence, constructing a question set, and outputting a physician query sentence set; S4, monitoring voice, physiological signals and facial expressions of a patient in real time through the physician query statement set, analyzing the voice speed variation trend among voice keywords, comparing positions with frequency mutation and waveform variation in the physiological signals, screening abnormal tension transition areas in an image sequence, integrating abnormal synchronous events and generating an abnormal co-occurrence factor set; S5, marking abnormal statement time points and tag sequences in a physician query statement set through the abnormal co-occurrence factor set, analyzing the tag correspondence between symptoms guided by the statement and image leading nodes, analyzing the consistency of semantic directions and tag logic, adjusting the ordering structure of the statement in the query flow, optimizing the interaction path sequence, and constructing a medical scene dynamic interaction decision instruction set.

Description

Medical scene dynamic interaction decision-making system and method based on multi-mode perception Technical Field The invention relates to the technical field of multi-modal sensing, in particular to a medical scene dynamic interaction decision-making system and method based on multi-modal sensing. Background The technical field mainly comprises collaborative acquisition of multi-mode signals, time sequence alignment and feature extraction of multi-mode information, a cross-mode information fusion mechanism and intelligent reasoning and judgment based on fusion results, and is a key technical path for realizing perception understanding and interaction capability in a complex intelligent system. The multi-mode perception medical scene dynamic interaction decision system is a decision support mode for carrying out medical auxiliary analysis and interaction response by combining dynamic changes of a patient scene based on multi-source information such as physiological signal voice input of medical image electronic medical records, the traditional mode generally adopts a medical image processing method to extract key focus characteristics, utilizes structured electronic medical records to mine and extract basic illness state data of a patient, acquires physiological states of the patient through a physiological signal recognition means, receives doctor-patient interaction instructions through a voice recognition mechanism, and then carries out reasoning and response on the multi-source data by adopting a rule-based knowledge reasoning mechanism or a trained classification decision model, so that perception and response of the system on the medical scene are completed. In the prior art, a multi-modal data fusion processing multi-dependency rule type reasoning model or a complete training classification system lacks linkage recognition capability for image frame content significance and case text semantic difference, when a case image is subjected to partial information deficiency or insufficient semantic coverage, the system is difficult to dynamically adjust and query and complement, particularly when clinical symptoms of a patient are complex in expression or a plurality of fuzzy areas exist, judgment deviation or missing key signs are often caused by image-text matching deviation, and meanwhile, joint perception mechanism is lacking for language, physiology and facial emotion changes of the patient, dynamic abnormal signals in doctor-patient interaction cannot be fully captured, and the integrity and real-time response capability of auxiliary diagnosis are affected. Disclosure of Invention In order to solve the technical problems in the prior art, the embodiment of the invention provides a medical scene dynamic interaction decision system and method based on multi-mode perception. The technical scheme is as follows: In one aspect, a medical scenario dynamic interaction decision system based on multi-modal awareness is provided, the system comprising: the main node identification module is used for screening focus candidate areas by extracting gray scale, edge and texture characteristics in medical images, calculating inter-frame edge changes and texture differences, analyzing image structure saliency, screening image frames as image dominant nodes and transmitting the image frames to the coverage judgment module and the reconstruction guide module; The coverage judgment and judgment module extracts patient medical record text according to the image leading node, recognizes symptom keywords, compares the symptom keywords with focus labels corresponding to the image leading node, generates semantic uncovered identification data, and transmits the semantic uncovered identification data to the reconstruction guide module; The reconstruction guide module extracts uncovered keywords and positions the missing semantic directions through the image dominant nodes and the semantic uncovered identification data, reconstructs physician query sentences, outputs a physician query sentence set and transmits the physician query sentence set to the factor construction module; the factor constructing module monitors the voice, physiological signals and facial expressions of the patient in real time through the physician query statement set, extracts the keyword interval and the speed change in the voice, combines the physiological index positioning signal segment to identify the waveform inversion and amplitude jump characteristics, combines the facial expression muscle tension change, constructs an abnormal co-occurrence factor set and transmits the abnormal co-occurrence factor set to the path rearrangement module. As a further scheme of the invention, the image dominant node comprises an image frame number, a dominant region position and a structure significance score, the semantic uncovered identification data comprises a missing tag list, a keyword matching degree and a semantic coverage rate, the physician query stateme